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Preface 


Artificial Intelligence is now everywhere and fuels both industry and daily life all 
over the world. We are in the era of “big data,’ and huge sums of information can 
be obtained which are too cumbersome for people to process themselves. These 
big data are even with much complex correlations behind them in various areas, 
such as computer vision and social media. For example, the complex correlations 
among pixels in an image reveal its semantic information, and different types of 
correlations among social posts infer the users’ emotions. Therefore, developing 
effective AI methods to exploit such complex data correlations has become an urgent 
but challenging task. 

Graph has been widely used to formulate data correlations. A graph is a non- 
linear data structure which is composed of groups of vertices and edges, representing 
the pairwise correlations among vertices. Graph learning and graph neural networks 
have attracted much attention in both research and industrial fields and become very 
hot topics in these years. It is noted that the world is far more complex than just 
pairwise connections, and thus graph-based methods still have limitations on high- 
order correlation modeling. 

Hypergraph, as a generation of graph, is able to formulate such high-order 
correlations among the data and has been investigated in last decades. Recent years 
have witnessed a great popularity of research on hypergraph-related AI methods, 
which have been used in computer vision, social media analysis, and etc. We noticed 
that there still has not been a theoretical book to systematically introduce the recent 
achievements in this field and then started preparation of this book. We summarize 
these attempts as a new computing paradigm, called hypergraph computation, which 
is to formulate the high-order correlations underneath the data using hypergraph, 
and then conduct semantic computing on the hypergraph for different applications. 

In this book, we introduce recent progress in hypergraph computation, from 
hypergraph modeling to hypergraph neural networks. The applications of hyper- 
graph computation are also discussed. We also summarize the recent achievements 
and useful tools in hypergraph computation. This book can be regarded as both a 
theoretical book and a manual on how to use hypergraph computation in practice. 


vi Preface 
Book Organization 


This book includes 13 chapters with 3 parts. The first part introduces the fundamen- 
tal knowledge of hypergraph computation. In this part, Chap. 1 depicts the basic 
knowledge, applications, and history of hypergraph. The mathematical foundations 
of hypergraph are introduced in Chap. 2. Three general paradigms of hypergraph 
computation are provided in Chap. 3. 

The second part focuses on hypergraph modeling and learning techniques. The 
first step of hypergraph computation is to construct a hypergraph to formulate the 
high-order correlations among data, which is provided in Chap. 4. Typical hyper- 
graph computation tasks are then provided in Chap. 5, including label propagation, 
data clustering, cost-sensitive learning, and link prediction. We further introduce the 
hypergraph structure evolution methods for hypergraph optimization in Chap. 6. The 
neural networks on hypergraph are introduced in Chap. 7. The practical applications 
of hypergraph computation require the capability of handling large-scale data. 
Therefore, we give an extensive introduction to large-scale hypergraph computation 
in Chap. 8. 

The third part introduces the applications of hypergraph computation in several 
fields, including social media analysis in Chap.9, medical and biological appli- 
cations in Chap. 10, and computer vision in Chap. 11. This part also introduces 
the DeepHypergraph library, a hypergraph computation library based on Python, 
in Chap. 12, and the future advancement of hypergraph computation research in 
Chap. 13. 


Prerequisites 


This book is designed for advanced undergraduate and graduate students, postdoc- 
toral researchers, lecturers, researchers, and industrial engineers, as well as anyone 
interested in AI, especially hypergraph computation. The readers are expected to 
have basic knowledge in probability, linear algebra, and machine learning. Graph 
theory could be a good prior before reading this book, but not mandatory. Besides 
the theoretical part from Section 3 to Section 8, we have also provided a series of 
applications from Section 9 to Section 11, which can be used as guidelines for the 
deployment of hypergraph computation in practice. 
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Chapter 1 A 
Introduction FEEN 


Abstract High-order correlations among data exist widely in various practical 
applications. Compared with the simple graph which can only model the pairwise 
relationship between two subjects, hypergraph is a flexible and representative model 
to formulate high-order correlations. Based on the hypergraph model, there have 
been many efforts to design the computation framework and analyze the high- 
order correlations. In this chapter, we briefly introduce the hypergraph computation, 
including its background, definition, history, recent challenges, and objectives. 


1.1 Background 


The basic elements of many natural and artificial systems have dependencies on 
each other and call for correlation modeling and analytic methods to study these. 
The graphs are all around us from different perspective, and in general all the 
objects in the real world are defined based on their connections with other objects. 
These connections can be described as a graph, which is a common data structure 
in many cases. For example, graphs can depict the path in a city, where each path 
is represented with an edge to show the spatial connections between two locations. 
Graphs are also employed in the airline route map, in which each vertex is an airport 
and each edge is an airline. 

Recently, the most challenging data processing problem comes from the con- 
nected data, not just from the discrete ones. How to exploit the underneath 
connections behind the data has become an urgent and important task in many 
applications. Generally, graph has been used to formulate such correlations among 
data. A graph is a nonlinear data structure which is composed of a group of vertices 
and edges. Here, the vertices in a graph represent the subjects to be analyzed, and 
the edges in a graph are the lines connecting two vertices in the graph. Figure 1.1 
shows an example of a graph. 

As a common way to model pairwise correlations among data, the components 
in a system can be represented by the vertices of a graph, and the associations 
between components are described by the edges. In this way, the association 
pattern is abstracted by the topological structure of the graph. In the past decades, 
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Fig. 1.1 An example of a Vy 
graph 


it was not easy to apply graph theory in practice because of the limitation of 
computing power. In recent years, with the advancement of information technology 
and computing power, graph theory has demonstrated its practical values. As scales 
of data grow, scientists have come up with the concept of network science. The 
study of network science can be applied in various fields. For example, by studying 
the connection relationship between terminals on the Internet, the efficiency of data 
transmission in a network can be estimated. The study of interpersonal relationships 
can help understand the way people communicate with each other, disseminate 
information, and generate community. Studying the transmission chain of infectious 
diseases can help predict risks in time, thus interrupt transmission, and prevent 
their spread. People have also found that many biological, social, information, and 
other real networks have nontrivial structural patterns in the connections among 
their elements. These patterns reflect meaningful features of the whole network. For 
example, the small-world phenomenon (the average path length in the network does 
not increase significantly with the increase of the network size) widely exists in 
social networks [1]. Another example is scale-free network [2], in which the vertex 
degree distribution follows a power-law distribution, and this phenomenon is known 
in some biological metabolic networks [3]. 

It is noted that the world is far more complex than just pairwise connections. 
Typical examples include social networks, protein-protein interaction networks, 
and brain networks. In social networks, the individual characteristics of users are 
related to the interactive patterns among users. The users with similar characteristics 
are more likely to connect with each other to form a social group. The social 
relationships of users also affect their profiling portraits. We notice that the 
correlations among these uses are not just pairwise connections but also group-like 
connections, which are more complex than these pairwise connections. Figure 1.2 
shows an example of social connections, in which each user could have different 
types of connections with two or more other users or items. 

In human brain networks, the cerebral cortex contains more than 10!! neurons 
and a cluster of neurons with similar functions and connections forms a nucleus. The 
nuclei can be further divided into different brain regions, resulting in a multilevel 
and multi-scale complex brain network. For example, the whole brain map includes 
Insula and Cingulate Gyri, Frontal Lobe, Occipital Lobe, Parietal Lobe, and other 
regions, which can be further divided into 90 brain regions that are provided in AAL 
atlas [4], such as Hippocampus and Parahippocampus. Each neuron can have more 
than 10,000 synapses, which can connect the neurons in the brain to other neurons in 
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(a) Pairwise correlations (b) High-order correlations 


Fig. 1.2. An illustration of pairwise correlations and high-order correlations between/among users 


the rest of the body or connect the neurons to the muscles. The connections among 
the neurons are complex and hard to be formulated in a graph, although graph is a 
typical way to model such correlations in the brain. 

Such complex correlations, i.e., the high-order correlation rather than pairwise 
ones, are very common in real-world data. To study these complex systems, it 
is necessary to characterize and analyze high-order relationships between their 
elements. Empirical studies have shown that the correlation patterns of a system 
often play an important role in functions of the system. In recent years, more 
researchers have begun to pay attention to this field and apply high-order correlation 
modeling and analytic methods. 

At the beginning of the development of machine learning on graph and network 
science, only graph has been used to model the network or the correlations, and 
the associations between the elements of the system were generally described 
by the topological structure of the graph. As a result, the pairwise connections 
can be described in the graph, while a large amount of semantic information 
in the system could be lost, and descriptive features in the network could not 
be extracted. Some well-discussed network properties, such as degree centrality, 
semi-locality centrality, and closeness centrality, were all based on such a static 
single network model. The underneath high-order information behind the data has 
to be degenerated to pairwise ones for processing, which may lead to serious 
information loss. With the development of big data, the explosive growth of 
data demonstrates their complexity and diversity, which calls for more complex 
data modeling methods. The network modeling methods for complex data types, 
complex topological structures, and complex connection patterns emerge. For 
example, the social closeness between individuals in a social network can be strong 
or weak, and a system with weight distribution for the association between vertices 
can be modeled using a weighted network [5]. Also, the power network and the 
communication network are inter-dependent in infrastructure construction. The 
vertices of the communication network provide control signals to the vertices of 
the power network, whereas the vertices of the power network supply power to 
the vertices of the communication network. The interdependence between different 
networks can be modeled using an inter-dependent graph [6]. Another example is 
the air transportation network, where the routes between the vertices may belong to 
different airlines. For the heterogeneity of object types and association relationships, 
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the concept of multi-layer network or graph has been proposed [7]. The last example 
is that the ecological food chain in the species network changes with the change of 
seasonal environmental conditions. For dynamic systems, the concept of temporal 
network has been introduced [8] to formulate the correlation among the subjects. 

Although graph-based methods have been developed for decades and great 
progresses have been achieved, they still have limitations. These graph models 
can better formulate the binary relationships between the elements in the system, 
while they may ignore the high-order correlations among three or more elements. 
In recent years, many studies have shown that modeling and optimizing high- 
order correlations are even more important in most of the applications [9-11]. 
For example, in the biosphere system, the high-order interactions between species 
ensure stable diversity of species [10]. The high-order characteristics of different 
networks can effectively distinguish their fields [11]. With the rapid development 
of network science, the complexity of data and correlation increase rapidly. In the 
fields of biological information, social computing, and image processing, there are 
a large number of multi-modal, heterogeneous, high-level data, and there are needs 
for effective high-order correlation modeling and optimization methods. 

As the subject of interdisciplinary study in many different fields including 
computer science, physics, and biology, high-order correlation modeling and opti- 
mization have attracted much attention in recent decades. There are a large number 
of high-order relationships in many systems in the real world [12]. For example, 
in social networks, people form groups of three or more to communicate, and 
in academic networks, multiple authors cooperate to write an article. Protein 
interactions in biological networks may occur between multiple proteins, and gene 
expression is driven by high-order interactions between biomolecules [13]. High- 
order associations among elements are difficult to be described by the topology 
of simple graphs. Under such circumstances, the corresponding mathematical 
expressions have been introduced, such as set systems [14], simplicial complexes, 
and hypergraphs [15]. However, how to deploy the mathematical expressions in 
computation paradigm is still an open problem. The complexity of high-order 
correlations is much higher than that of pairwise correlations, which brings about 
new challenges to computation paradigms. 

Hypergraph, as a generation of graph, which is able to formulate high-order 
correlations among the data, has been investigated recently. In this book, we 
introduce recent progress on hypergraph computation, from hypergraph modeling 
to hypergraph neural networks. Below we first introduce the basic definitions of 
hypergraph and then show the applications and research history of hypergraph. 
Finally, we provide the summary of our works in hypergraph computation and the 
structure of this book. 
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The hypergraph is an important concept in discrete mathematics, which is a 
generalization of the graph. Therefore, many concepts of hypergraphs can be defined 
related to the well-known definition of graphs. A hypergraph is defined as a pair of 
hypervertex set and hyperedge set. The hypervertex set, also called the vertex set, 
is a finite set, whereas the hyperedge represents the subset of the vertex set. As the 
hyperedge can connect any number of vertices, more general types of relationships 
could be modeled by hypergraphs rather than graphs. The order and the size of the 
hypergraph can be defined based on the vertex set and hyperedge set, i.e., the order 
of the hypergraph represents the cardinality of the vertex set, and the size of the 
hypergraph denotes the cardinality of the hyperedge set. 

Similar to graphs, two specific types of hypergraphs can be defined, including 
the empty hypergraph and the trivial hypergraph. 


* The empty hypergraph is the hypergraph with empty vertex set and empty 
hyperedge set. 

* The trivial hypergraph is the hypergraph with nonempty vertex set and empty 
hyperedge set. 


Generally speaking, unless stated otherwise, hypergraphs have a nonempty vertex 
set and nonempty hyperedge set and do not contain empty hyperedges. 

The isolated vertex denotes the vertex which is not contained in any of the 
hyperedges. Two vertices are adjacent if there exists a hyperedge containing both 
of these two vertices. Two hyperedges are incident if they have a nonempty 
intersection. 

The sub-hypergraph and partial hypergraph can be defined as follows: 


* An induced sub-hypergraph of given hypergraph is the hypergraph whose vertex 
set is the subset of the given hypergraph, and the hyperedges have only one 
element or the intersection of the vertex set no less than two. 

* A sub-hypergraph of the given hypergraph is the hypergraph whose both the 
vertex set and the hyperedge set are the subset of that of the given hypergraph. 

* A partial hypergraph is a hypergraph whose hyperedge set is the subset of the 
given hypergraph. 


Two special types of the hypergraph can be defined based on the degree: 


* A regular hypergraph is the hypergraph in which all of the vertices have the same 
degree. 

* A uniform hypergraph is the hypergraph in which all of the hyperedges have the 
same degree. 


The concept of connectivity is defined as follows. The loop denotes the hyper- 
edge with only one element. The path is a vertex—hyperedge alternative sequence, 
where the vertex belongs to the consecutive hyperedge in the sequence. The cycle 
is a path whose first vertex is the same as the last vertex. The length of a path is the 
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Fig. 1.3 An example of a X11 
hypergraph e 


number of vertices in the path. A path connects two vertices if these two vertices are 
in the path. A hypergraph is connected if any pair of vertices is connected, otherwise 
it is disconnected. The distance between two vertices is the minimum length of the 
path connecting these two vertices. The diameter of the hypergraph is the maximum 
distance among all pairs of vertices. 

Here, we provide an example of a hypergraph in Fig. 1.3. In this hypergraph, there 
are 11 vertices and 5 hyperedges. In this hypergraph, the hyperedge e; connects 
vertices x1, x2, x3, and x4. The hyperedge e» connects x4, x6, x7, and xg. The 
hyperedge e3 connects x5 and xg. The hyperedge e4 connects x4, x5, and xg. The 
hyperedge es is a loop, which only connects vertex x10 itself. Vertices x9 and x11 
are two isolated vertices. The hypergraph is disconnected since x11 is not connected 
with any other vertex. x3 > e] — x1 > e3 > xg — e — x7 is a path from x3 to 
x7, with length 4. The distance between x4 and xs is 3 since the shortest path from 
X4 tO X5 iS X4 — 62 > Xg > £4 > Xs. 

Besides Fig. 1.3, there are also other typical illustrations of hypergraph, which are 
shown in Fig. 1.4. In Fig. 1.4a, each circular represents a hyperedge. In Fig. 1.4b, all 
the lines with the same color represent a hyperedge, which connect the vertices in 
the hyperedge. In Fig. 1.4c, each hollow circle indicates a hypergraph and the lines 
with the same color link the vertices in the hyperedge. 

It is noted that the hypergraph-type structures may be not explicit in many 
applications and they are hidden behind the data which can be observed directly. 
In some cases, we may only capture some pairwise correlations among the data, 
while the high-order correlation is needed to the regenerated based on these 
observations. For example, some popular citation networks, such as Cora, Citeseer, 
and PubMed [16], are widely used for analysis, while all these datasets only contain 
graph-type data, which treat the articles as vertices and the citation relationships as 
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Fig. 1.4 Three typical hypergraph illustrations 


links. Under such circumstances, to exploit the high-order correlation among these 
data, we need to transform these data to a hypergraph. As a typical method, a co- 
authorship hypergraph can be generated, which formulates the articles as vertices, 
and articles with the same authors are connected by a hyperedge. In a similar way, 
a co-citation hypergraph can be generated, which treats the articles as vertices as 
well, and the articles with the same citation are treated as a hyperedge. 


13 Applications of Hypergraph 


Hypergraph has been applied across several disciplines, including biology, eco- 
nomics, and sociology, due to its superiority in complex correlations modeling, 
which has promoted intelligent applications. In this part, we introduce several 
typical applications of hypergraphs to help understand this powerful tool. 

One representative application is social computing. The social media data have 
been increasing rapidly over the past couple of decades, which can provide potential 
population-level insights. The hypergraph [17] is a useful tool for discovering the 
complex and hidden correlations from the data, in which the hypergraph structure 
can be used to formulate the high-order correlation in social networks. 

In recommender system, the hypergraph is used to model the user—item network, 
to profile the user, and to further predict the preferences (future interactions). Given 
the raw user—item network without other information than the historical interactions 
between users and items, hypergraph [17] can be used to discriminatively formulate 
the high-order connectivities among users and items separately and conduct the 
collaborative filtering task. Sometimes the users and the items may be attached 
with different attributes or properties. For example, the user-side information 
may include the gender, age, and personality, and the item-side information may 
contain the category, text description, and image. This attribute information can 
help capture the user’s preference. Therefore, another application of hypergraphs in 
recommender system is attribute modeling and inference. 

Another popular yet challenging social media computing application is sentiment 
analysis, with the goal of recognizing the real emotions and attitudes of people in 
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social media contexts. Nevertheless, the multi-modality and complexity of social 
media data have made the task more difficult. For example, the text, images, and 
videos may coexist in one tweet. Additionally, there are intricate relationships 
between posts, such as in the dimensions of time, location, and user preferences. 
Therefore, how to find out the complex relationship between tweets and analyze the 
user sentiment has become an urgent issue. To this end, hypergraph [18] can be used 
to formulate the correlation among each sample and conduct robust and accurate 
multi-modal sentiment prediction, taking into consideration different moods having 
their own characteristics, and that sentiment analysis should be based on the joint 
analysis of multiple information. As far as social event detection is concerned, 
exploring a set of highly related posts becomes more important because of noise 
and insufficient content in a single post that fails to convey clear and comprehensive 
information. Hypergraph [19] can be used to characterize the relationship between 
heterogeneous data among different tweets for its superiority in modeling high-order 
correlations between data of various posts, modalities, and times, therefore enabling 
real-time social event detection. Specifically, each microblog is connected with its 
several textual-related and visual-related microblogs and forms two hyperedges. 
Next, the microblog clique, a basic unit consisting of a set of highly related tweets, 
is produced by using the hypergraph cut method to put together microblogs that are 
about the same subject. 

Hypergraph has also shown its advantage in medical and biological applications. 
In the past few decades, massive amounts of biological and medical data have 
been produced. The data is complex, heterogeneous, and multi-modal, with inter- 
woven inter- and intra-data correlations. By concatenating hyperedge groups, the 
hypergraph [20-22] can naturally accommodate multi-modal or heterogeneous data. 
Moreover, in doing so, it can discriminatively use the complementary information 
among these data. The pipeline below can be used to describe how hypergraph com- 
putation is used in biological and medical tasks: (1) modeling the medical image, 
patches, or biological entities as vertices and connecting them with hyperedges 
based on their feature similarity or high-order topological links and (2) learning 
high-order correlations between data using a series of hypergraph computation 
methods. In this type of applications, hypergraph has been used for mild cognitive 
impairment (MCI) identification using magnetic resonance imaging (MRI) [23], 
COVID-19 identification using CT imaging [24], ASD identification using brain 
functional networks [25], medical image retrieval [26], etc. 

The aforementioned examples are just a small part of hypergraph applications. 
Hypergraph computation techniques can be used in any cases where there exist high- 
order and complex correlations among data, such as computer vision, knowledge 
graph, and so on. 
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14 The History of Studies on Hypergraph 
1.4.1 Topology and Coloring on Hypergraph 


The studies of utilization on hypergraph have a long history. In 1943, Prenowitz 
et al. [27] first illustrated several kinds of geometries (projective, descriptive, and 
spherical) as hypergroup or multigroup. Prenowitz et al. [28] created Geometries on 
Join Spaces, a unique hypergroup that has been proven to be a valuable tool in the 
study of a variety of topics, including graphs, hypergraphs, binary relations, fuzzy 
sets, and rough sets. In 1996, Rosenberg et al. [29] first addressed the relationships 
between Hyperstructures (hypergraphs) and Binary Relations in the broadest sense. 
Later, they were also studied by Corsini and Leoreanu [30]. Rosenberg et al. [29] 
first developed join spaces related to fuzzy sets in 1996. Corsini, Leoreanu, and 
Tofan [31] have all reexamined these structures. Zahedi et al. [32] also advanced the 
concepts of linking a hypergraph with a fuzzy set and examining algebraic structures 
equipped with a fuzzy structure. 

Hypergraph coloring is a typical and important task, which has attracted much 
attention since last century. It is fundamental to combinatorics and can be used 
to determine bounds for the chromatic number of some graphs as described by 
Kierstead et al. [33]. Lu et al. [34] suggested these algorithms to solve different 
optimization problems, such as divide and conquer and partition problems, in which 
hypergraph coloring can also be used to find monochromatic paths and cycles. 
Voloshin et al. [35, 36] described how to color mixed hypergraphs, which are divided 
into hyperedge and anti-hyperedge families. In such a case, they further applied it 
to energy supply problem. 

The problem of finding large matches is closely related to the problem of 
bounding the chromatic index of a hypergraph (notice that the color classes of a 
proper edge-coloring form a matching). As a classical subject in the study of graphs, 
matching theory is very well developed and goes back to the work [37] in the 1930s. 
Tutte's theorem [38] is a characterization of graphs that contains perfect matchings. 
Edmonds et al. [39] proposed the Blossom algorithm, which uncovers a maximum 
matching in a graph in a polynomial amount of time for graphs containing a perfect 
matching. The above methods are early works on hypergraph-related research. 


1.4.2 Hypergraph Partitioning, Clustering, and Machine 
Learning 


Hypergraph partition is another important problem on hypergraph. It is defined 
in the Encyclopedia of Parallel Computing! that hypergraph partitioning involves 


l https://link.springer.com/referenceworkentry/10.1007/978-0-387-09766-4. 1. 
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dividing a hypergraph into two or more roughly equal parts in such a way that 
the cost function of the hyperedge connecting vertices in the different parts is 
minimized. In many cases, this definition is too restrictive and requires more than 
two parts. Karypis et al. [40] proposed the hMetis algorithm, which is based on 
multilevel coarsening of hypergraphs. The method iteratively bisections coarsened 
hypergraphs, starting with the smallest. George et al. [41] further developed the 
hMeTiS-Kway algorithm, which directly constructs a K-way partitioning of a hyper- 
graph with coarse-uncoarse paradigm to solve the K-way hypergraph partitioning 
problem. 

Besides, Papa et al. [42] provided several methods of partitioning hypergraphs 
and defines clustering as "the process of merging vertices into larger groups 
of vertices known as clusters to compute a coarser hypergraph from an input 
hypergraph.” A number of applications of partitioning and clustering are also 
given, including VLSI design, numerical linear algebra, automated theorem proving, 
and formal verification. Several applications and methods have been described in 
the literature. For more details, a survey of clustering ensemble techniques has 
been published in [43], which includes hypergraph partitioning techniques as well. 
Multilevel strategies are often required in clustering and partitioning, which have 
been well studied in previous works. It has been extensively used in VLSI design 
[40], parallel scientific computing [44—46], image categorization [47], and social 
networks [48, 49]. 

In this century, hypergraph has been used in machine learning. Transductive 
hypergraph learning [48] is introduced to give the basic mathematical formulation 
of the objective function for predicting labels of vertices on a hypergraph. Since 
the performance of hypergraph learning is related to the modeling quality of the 
hypergraph, there are some efforts to further assign weights to the components in 
the hypergraph, including hyperedges, vertices, and hyperedge-dependent vertex 
weights [50, 51]. To accelerate the label propagation process on hypergraph, the 
cross diffusion on multiple hypergraphs is further introduced to model the high- 
order correlations among multi-modal data and conduct multi-modal information 
fusion [52]. 


1.4.3 Deep Learning on Hypergraph 


Research on high-order representations of hypergraph structures has also been 
inspired by deep learning's powerful learning and modeling abilities. Generally 
speaking, most deep learning methods on hypergraph can be divided into spectral- 
based methods and spatial-based methods. 

As for the spectral-based methods, Feng et al. [53] proposed Hypergraph Neural 
Networks (HGNNs) to model non-pairwise relations based on the hypergraph 
Laplacian. Multi-modal data can be naturally modeled using the proposed methods. 
It is also possible to classify images using hypergraph neural networks[54]. 
Using tools from the spectral theory of hypergraphs, Yadati et al. [55] proposed 
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HyperGCN to train a GCN for semi-supervised learning on hypergraphs using graph 
convolutional networks (GCNs). As for the spatial-based method, by extending the 
dynamic hypergraph learning, Jiang et al. [56] proposed a dynamic hypergraph 
neural network, which can adaptably change the hypergraph structure at each 
layer. As opposed to hypergraph convolution, where the underlying structure is 
defined beforehand, Bai et al. [57] proposed a hypergraph attention mechanism 
strategy to learn a dynamic connection of hyperedges, which propagates and 
gathers information in the task-relevant parts of the graph, thereby generating 
more discriminative vertex embeddings. Moreover, Gao et al. [58] proposed a 
general hypergraph neural network framework, which can be applied to multiple 
types of hypergraphs like undirected hypergraph, directed hypergraph, probabilistic 
hypergraph, vertex/hyperedge weighted hypergraph, etc. 

For homogeneous and heterogeneous hypergraphs, Zhang et al. [59] proposed a 
self-attention-based hypergraph neural network (Hyper-SAGNN). By mapping the 
hypergraph to a weighted attribute line graph, Bandyopadhyay et al. [60] achieved 
a bi-injective hypergraph structure. Huang et al. [61] proposed UniGNN, which 
can generalize general GNN models into hypergraphs by interpreting the message 
passing process in graph and hypergraph neural networks. These neural network 
methods on hypergraph enable the representation learning by incorporating high- 
order correlation in process. 


15 Hypergraph Computation: Challenges and Objectives 


Hypergraph has its advantage on high-order correlation modeling compared with 
graph and other structures. To take this advantage in practice, hypergraph can be 
used to formulate such correlations and the conduct computing task accordingly. 
In this part, we summarize the objective of hypergraph computation, especially the 
main challenges and the tasks inside. 

Below we give the definition of hypergraph computation: hypergraph computa- 
tion is to formulate the high-order correlations underneath the data using hypergraph 
and then conduct semantic computing on the hypergraph for different applications. 

The main challenges and objectives in hypergraph computation are from three 
parts, including how to generate a hypergraph, how to deal with large scale data, 
and how to conduct learning on hypergraph. 


1. How to generate a hypergraph. In most cases, the hypergraph structure is 
not explicitly existed. What can be observed could be non-structure data, such 
as images, videos and discrete signals, and pairwise relationships between two 
subjects. To reveal the underneath high-order correlation as a hypergraph, it 
is needed to define how to generate it. More importantly, the observed data 
could be noisy, missing, and tend to be multi-modal. How to describe these 
data is also challenged. Under such circumstances, it is difficult to generate an 
accurate hypergraph structure based on these data. Therefore, how to generate a 
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hypergraph, especially a good hypergraph structure for specific task, is the first 
challenge in practice. 

2. How to deal with large scale data. Computational complexity is a major issue 
for graph data, which is also very serious for hypergraph. The data in many 
applications, such as social media and brain neurons, are in million level or more. 
Confronting such large scale data, how to effectively and efficiently conduct 
storage and computing on hypergraph require further research. 

3. How to conduct learning on hypergraph. Given a hypergraph, learning task 
can be conducted on the hypergraph structure, and it is important to design label 
propagation method on hypergraph. Besides traditional feature representation 
methods, the connections can also be used as representation. Given such high- 
order correlation by hypergraph, it is useful to learn new representations on 
hypergraph. Therefore, how to conduct representation learning on hypergraph 
is an important topic. 


Hypergraph modeling can be briefly divided into two categories, i.e., the intra- 
correlation modeling and the inter-correlation modeling, as shown in Fig. 1.5. Here, 
the intra-correlation modeling regards the high-order correlations inside the subject. 
The components of the subject are represented as the vertices, and the correlations 
among these components are represented as hyperedges in the hypergraph. In 
these cases, the hypergraph, named intra-hypergraph, aims to represent the subject 
itself. The inter-correlation modeling concentrates on the high-order correlations 
among different subjects. A group of subjects is represented as the vertices, and the 
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Fig. 1.5 The intra-hypergraph and the inter-hypergraph based on the intra-correlations and the 
inter-correlations among components and subjects 
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correlations among these subjects are represented as hyperedges in the hypergraph, 
named inter-hypergraph. The objective is to learn the representation or connections 
of the target subject with the help of its correlations to other subjects. Here 
we take image representation as an example. When an image is selected as the 
subject, the correlations among the pixels or the patches in the image are intra- 
correlations, and the corresponding intra-hypergraph can be generated for image 
representation. On the other side, we can also observe other images for processing. 
The correlations among the subject image and other images are inter-correlations, 
and the corresponding inter-hypergraph can be generated for image representation 
too. That is to say, the intra- and inter-correlations can be regarded as the views from 
different scales. If we take the subject itself as the target system, the correlations of 
the subject and other subjects are inter-correlations of the subject, corresponding 
to an inter-hypergraph. If we take the group of subjects as the target system, the 
correlations of these subjects are intra-correlations, leading to an intra-hypergraph 
accordingly. 


1.6 Structure of This Book 


This book is composed of 13 chapters and the structure of the remainders is 
introduced here. 


e Chapter 2. Mathematical Foundations of Hypergraph. This chapter introduces the 
fundamental mathematics of hypergraph and presents the mathematical notations 
that are used to facilitate deep understanding and analysis of hypergraph 
structure. 

* Chapter 3. Hypergraph Computation Paradigm. This chapter introduces three 
typical hypergraph computation paradigms, including inter-representation com- 
puting, inter-representation computing, and group correlation computing. 

* Chapter 4. Hypergraph Modeling. This chapter introduces different hypergraph 
modeling methods, including implicit hypergraph modeling and explicit hyper- 
graph modeling. Examples on computer vision, recommender system, and other 
applications are also provided in this chapter. 

* Chapter 5. Typical Hypergraph Computation Tasks. This chapter introduces the 
typical hypergraph computation tasks, including label propagation on hyper- 
graph, data clustering on hypergraph, imbalanced learning on hypergraph, and 
link prediction on hypergraph. 

* Chapter 6. Hypergraph Structure Evolution. This chapter introduces the structure 
evolution methods on the hypergraph, which optimize the hypergraph struc- 
ture accordingly, including both the hypergraph component optimization and 
hypergraph structure optimization. We briefly introduce the incremental learning 
method on growing data. 

* Chapter 7. Neural Networks on Hypergraph. This chapter introduces recent 
progresses on hypergraph neural networks, including the spectral-based methods 
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and the spatial-based methods. The comparison between graph neural networks 
and hypergraph neural networks is also provided in this chapter. 

* Chapter 8. Large Scale Hypergraph Computation. This chapter introduces how 
to deal with large scale data. More specifically, two kinds of large scale 
hypergraph computation methods, i.e., factorization-based hypergraph reduction 
and hierarchy-based hypergraph learning, are provided in this chapter. 

* Chapter 9. Hypergraph Computation for Social Media Analysis. This chapter 
introduces applications of hypergraph computation on social media analysis, 
including recommender system, sentiment analysis, and emotion recognition. 

* Chapter 10. Hypergraph Computation for Medical and Biological Applications 
This chapter introduces applications of hypergraph computation on medical and 
biological applications, including computer-aided diagnosis, survival prediction 
with histopathological image, drug discovery, and medical image segmentation. 

* Chapter 11. Hypergraph Computation for Computer Vision. This chapter intro- 
duces applications of hypergraph computation on computer vision, including 
visual classification, 3D object retrieval, and tag-based social image retrieval. 

* Chapter 12. The DeepHypergraph Library. This chapter introduces the DeepHy- 
pergraph Library, a hypergraph computation library based on Python. 

* Chapter 13. Conclusions and Future Work. This chapter concludes this book and 
introduces three further research directions of hypergraph computation. 


17 Summary 


In this chapter, we introduce the basic ideas and background of hypergraph 
computation. We also provide the applications and the related research history 
on hypergraph. The idea of hypergraph computation is detailed introduced and 
discussed in this chapter. We also summarize our studies on hypergraph computation 
and present the organization of this book. 
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Chapter 2 A 
Mathematical Foundations of FEEN 
Hypergraph 


Abstract In this chapter, we introduce the mathematical foundations of hypergraph 
and present the mathematical notations that are used to facilitate deep understanding 
and analysis of hypergraph structure. A hypergraph is composed of a set of vertices 
and hyperedges, and it is a generalization of a graph, where a weighted hypergraph 
quantifies the relative importance of hyperedges or vertices. Hypergraph can also be 
divided into two main categories, i.e., the undirected hypergraph representation and 
the directed hypergraph representation. The latter one further divides the vertices 
in one hyperedge into the source vertex set and the target vertex set to model more 
complex correlations. Additionally, we discuss the relationship between hypergraph 
and graph from the perspective of structural transformation and expressive ability. 
The most intuitive difference between a simple graph and a hypergraph can be 
observed in the size of order and expression of adjacency. A hypergraph can 
be converted into a simple graph using clique expansion, star expansion, and 
line expansion. Moreover, the proof based on random walks and Markov chains 
establishes the relationship between hypergraphs with edge-independent vertex 
weights and weighted graphs. 


2.1 Introduction 


The importance of high-order complex network modeling has been discussed in 
Chap. 1. In this chapter, we introduce the basic knowledge of hypergraph. In a 
hypergraph, the edge degree is usually higher than that of a simple graph, which 
is two for a simple graph. Different from a graph structure that can model pairwise 
connections with its 2-degree edges, a hypergraph can model correlations between 
practical data that are much more complex than pairwise relationships. As a result 
of its versatility and usefulness of modeling complex correlations of data, machine 
learning on hypergraph has attracted increasing attention. 

Machine learning methods on hypergraph have been used in many real-world 
applications due to its advantages. A wide variety of tasks have been performed with 
hypergraph in computer vision, including image retrieval [1] and 3D object classi- 
fication [2], video segmentation [3], re-identification of people [4], hyper-spectral 
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image analysis [5], landmark retrieval [6], and visual tracking [7]. It is possible 
to embed a wide range of subjects into a hypergraph structure for these tasks. In 
different tasks, the hypergraph structure can be used to formulate the correlation 
among a variety of subjects. In image retrieval [3], the correlation among different 
images can be modeled in a hypergraph, where each vertex denotes an image and 
the hyperedges can be generated by finding similar image features. In 3D object 
classification [2], the correlation among different 3D objects can be modeled in 
a hypergraph, where each vertex denotes a 3D object and the hyperedges can be 
generated based on the similarity among these 3D objects. In person re-identification 
[4], a hypergraph structure can be constructed, where each vertex represents a 
personal image and the hyperedges can be generated based on the similarities 
in the feature space. Similar modeling attempts have been deployed in medical 
image analysis and bio-informatics studies to identify genes [8, 9], predict diseases 
[10, 11], identify sub-types [12], and analyze functional networks [13]. 

Before detailed introduction of the hypergraph computation paradigm, hyper- 
graph modeling, and other related methods and applications, in this chapter, we 
first present preliminary knowledge of hypergraph and multiple representations of 
hypergraph. We also compare the hypergraph structure with the graph structure from 
four aspects. 


2.2 Preliminary Knowledge of Hypergraph 


The basic concepts of hypergraph are hereby briefly discussed. Table 2.1 provides 
the main notations and definitions of hypergraphs throughout this chapter. We 
first introduce undirected hypergraph and directed hypergraph, respectively, and 
then introduce the K-uniform hypergraph, probabilistic hypergraph, the relationship 
between hypergraph and bipartite graph, and the weights on hypergraph. 


2.2.1  Undirected Hypergraph 


Let 4 be an indication of a hypergraph (undirected hypergraph), which consists of a 
set of vertices VY and a set of hyperedges £. In a weighted hypergraph, each hyper- 
edge e € & is assigned with a weight w(e), symbolizing the importance of the con- 
nection relationship throughout the whole hypergraph. Let W denote the diagonal 
matrix of the hyperedge weights, i.e., diag(W) = [w (e1), w(e2),...,W (esi) ]: 
Given a hypergraph Y = (Y, &, W), the structure of the hypergraph is usually 
represented by an incidence matrix H € {0, 1}!”!*'¢l, with each entry H(v, e) 
indicating whether the vertex v is in the hyperedge e: 


1 if 
H(v, e) = ee (2.1) 
0 ifvée, 
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Table 2.1 Notations and definitions of hypergraphs 


Notation | Definition 

G The hypergraph 

Vv The set of vertices 

e The set of hyperedges 

WwW The diagonal matrix of the hyperedge weights 

U The diagonal matrix of the vertex weights 

X The vertex feature matrix 

Y The vertex label matrix 

H The |7/| x |&| incidence matrix of undirected hypergraph structure. H(v, e) 
indicates the connection strength between vertex v and hyperedge e 

D, The diagonal matrix of vertex degrees 

De The diagonal matrix of hyperedge degrees 

A The Laplacian matrix of hypergraph 

Xi The feature vector of vertex v; 

d(v) The degree of vertex v 

ó(e) The degree of hyperedge e 

w(e) The weight of hyperedge e 

u(v) The weight of vertex v 


where H(v, e) indicates the possibility of vertex v assigned to hyperedge e or the 
importance of vertex v for hyperedge e. The degree of hyperedge e and the degree 
of vertex v are defined as follows: 


êle) = ps Hv, e), (2.2) 
vey 
and 
d(v) — b» w(e) * H(v, e). (2.3) 
eec 


The traditional hypergraph structure creates associations among vertices, with a 
single hyperedge connecting multiple vertices that have associations. All vertices on 
the same hyperedge are given a value of 1 in the incidence matrix H. The adjacency 
matrix H is calculated as in (2.1), whose elements are valued by 0 or 1. Each row 
represents each vertex in the hypergraph and the columns represent all hyperedges. 
Each column represents the set of vertices on this hyperedge. 

Figure 2.1 shows an undirected hypergraph, including the hypergraph itself, the 
incidence matrix H, the vertex set 7, the hyperedge set £, and the weight matrix W. 
In the illustrated undirected hypergraph, there are 3 hyperedges e1, e2, and e3 with 
6 vertexes. The degree of the hyperedge e3 is 3, which contains vertices (vs, v4, vg]. 
By the same token, other elements of D, can be inferred. Vertex v3 belongs to the 
hyperedges e» and es, and the degree of the vertex is 2. The incidence matrix H of 
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hypergraph is readily obtained by the rules of construction, which are shown on the 
right side of Fig. 2.1. 

Given the incidence matrix H as calculated as in Eq. (2.1), all elements are valued 
by either O or 1. It is noted that the connection weights of different vertices on a 
hyperedge could be different. For example, some vertices are highly connected in 
the hyperedge and with high weights, while others may be with low weights. That 
is to say, the sum of each column of H is 1 (or not, due to different applications and 
objectives) and its values represent the vertex importance on this hyperedge. 

There are various rules that can be used to determine whether vertices are 
associated with one another. Hyperedge groups can be generated from the data with 
a graph structure by using pairwise edges and k-hops; for the data without a graph 
structure, they can be generated by using neighbors in feature space. A detailed 
description of these methods is provided in Chap. 4. 


2.2.2 Directed Hypergraph 


The real world is incompatible with traditional undirected hypergraph representation 
in that hyperedges may be directional. Therefore, the representation of directed 
hypergraph structures is important. In each hyperedge, the vertex can be further 
divided into two sets: the source vertex set and the target vertex set. On directed 
hypergraph, a trivial definition [14] for the incidence matrix is defined as follows: 


—l ifv € T(e) 
H(v,e)= 4 1 ifve S(e) (2.4) 
O otherwise, 
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Directed Hypergraph 
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Fig. 2.2 An example of a directed hypergraph 


where T (e) and S(e) are the target and source vertices for hyperedge e, respectively. 
The incidence matrix H is split into two matrices, Hs and Hi, describing the source 
and target vertices for all hyperedges, respectively. When passing messages with 
these two incidence matrices, it is important to maintain the directional information. 
Two different incidence matrices guide message passing in the directed hypergraph, 
H, and Ht, unlike in the undirected hypergraph. The average aggregation of 
messages is normalized by D, and D, as two matrices, and it can be formulated 
as follows: 


| D, = diag(col sum(H,)) (2.5) 


D, = diag(col sum(H,)), 


where diag(v) is a function that converts a vector v to a diagonal matrix. The 
col sum(-) is a column accumulation function. 

Figure 2.2 shows an example of directed hypergraph including the directed 
hypergraph itself, the incidence matrix H, the source incidence matrix H;, and 
the target incidence matrix H;. The illustrated directed hypergraph contains six 
vertices and two hyperedge e; and e». e} connects four vertices and e» connects 
three vertices. In hyperedge e1, the source vertices are vj and v2, and the target 
vertices are v4 and vs. As for the hyperedge e», the source vertices are v2 and v3, 
and the target vertices are only ve. 


2.2.3 Probabilistic Hypergraph 


In the real-world correlations, the intensity of the connection can not only be a 
binary number but also be a continuous number from zero to one. Consequently, the 
incidence matrix may be a continuous matrix with elements ranging from 0 to 1, 
which is adopted to denote a probabilistic hypergraph. 

As shown in Fig. 2.3, the probabilistic hypergraph consists of six vertices and 
three hyperedges. The hyperedge e; connects three vertices vj, vo, and vs. The 
intensity of the connection in this hyperedge is not the same. As shown in the 
right side of the figure, e; connects v, with an intensity of 0.3, connects v2 with 
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Probabilistic Hypergraph 
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Fig. 2.3 An example of a probabilistic hypergraph 


an intensity of 0.8, and connects vs with an intensity of 0.5. The degree of vertex 
and hyperedge in this type of hypergraph is computed by the sum of the row or 
column of the hypergraph incidence matrix H, as shown in the bottom of Fig. 2.3. 


2.2.4 K-Uniform Hypergraph 


In many applications, hyperedges in a hypergraph may connect the same number of 
vertices, which is known as the k-uniform hypergraph. In the k-uniform hypergraph, 
each hyperedge contains precisely k vertices, as shown in Fig. 2.4. Under this 
definition, a simple can be regarded as a spatial case of hypergraph, a 2-uniform 
hypergraph, where each hyperedge only connects two vertices. 

Figure 2.4 illustrates an example of 3-uniform hypergraph. The hypergraph 
consists of six vertices and three hyperedges, and each hyperedge contains precisely 
3 vertices. Hyperedge e, connects vertices v1, v2, and vs. Hyperedge e» connects 
vertices vj, v2, and v3. The degree of all hyperedges in this type of hypergraph is 
consistent k. 


2.2.5 Hypergraph and Bipartite Graph 


The bipartite graph can be indicated by 4 = (47, Y^, &}. Unlike the simple graph, 
vertices in the bipartite can be divided into two disjoint and independent sets Y 
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Fig. 2.5 The relationship between hypergraph and bipartite graph 


and Y. Every edge only connects one vertex in set 7/ and another vertex in set 
f . Obviously, an undirected hypergraph can be regarded as a bipartite graph if the 
hyperedges are treated as another vertex set, as shown in Fig. 2.5. 

Figure 2.5 illustrates examples of converting hypergraph to bipartite graph. The 
bipartite graph can be generated by two strategies: the vertices and hyperedges 
are treated as vertices in Y and vertices in Y (as illustrated in the left part), and 
the vertices and hyperedges are treated as vertices in Y and vertices in WY (as 
illustrated in the right part). Similarly, a bipartite graph can also be transformed 
to an undirected hypergraph with set Y/Y as the hyperedges. It is not mean 
that the hypergraph is the same as or can be replaced with the bipartite graph. 
The transformation only exists in the undirected hypergraph and the probabilistic 
hypergraph. Confronting more complex hypergraph like directed hypergraph, the 
transformation will be invalid. 
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2.2.6 The Weights on Hypergraph 


It is noted that there are different weights on a hypergraph, which provide 
additional information to assign values to a hypergraph structure. This is a more 
semantically preferred way of representing a hypergraph, as different components 
of a hypergraph, such as a vertex, a hyperedge or even a sub-hypergraph, should 
have different impact on the relationship modeling. For example, in a recommender 
system, the weights in the user profile influence the categorization of user attributes. 
If the attributes are not categorized accurately, the accuracy of the recommendations 
and marketing based on the profile could be questionable. The main types of 
weight information on a hypergraph are hyperedge weights and vertex weights, with 
the magnitude of the values indicating the relative importance of hyperedges and 
vertices, respectively. 

First, let us show how the weights on vertex can be used. Different vertices 
may have varying importance on hypergraph modeling, and vertex weights are used 
in a hypergraph to determine the importance of different vertices. If a vertex is 
connected on the hypergraph strongly (with high correlations), it should be with a 
large vertex weight. Otherwise, it should be with a small vertex weight. For those 
vertices which have a 0 weight value in the incidence matrix, it can also be regarded 
as it is connected by the corresponding hyperedge with a weight of 0. Here, the 
diagonal elements of U to represent the weights of vertices, which are between 0 
and 1, which reveal the relative importance of these hyperedges. Figure 2.6 shows 
an example hypergraph with vertex weights. In this figure, the weight of each vertex 
is denoted by the size of the vertex node. Vertex vg has a weight of 0.9, which is 
larger than all other vertices, and vertex v» is the smallest among the six vertices. 

Then, let us focus on the weights on hyperedge. Hyperedge weights reflect the 
importance of different hyperedges in a hypergraph. As different hyperedges may 
have different importance in representing connections among vertices, it is crucial 
that hyperedges be weighted corresponding to their representative capabilities. In 
some cases, a part of hyperedges are more reliable due to its generation method or 
the features employed in this task, and these hyperedges should be given a large 
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Fig. 2.7 An example of a Hyperedge Weighted Hypergraph 
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weight during the learning process. Here, the diagonal element values of W can 
be used to represent the weights of vertices, which are between 0 and 1, revealing 
the relative importance of these hyperedges. Figure 2.7 shows an example of the 
hyperedge weighted hypergraph. In the illustration, the three hyperedges have the 
weights 0.3, 0.9, and 0.5, respectively. 


2.3 Comparison Between Graph and Hypergraph 


As a generalization of graph, the relationship between graph and hypergraph 
is a fundamental question. In this part, we detailedly introduce the relationship 
between graph and hypergraph from four aspects, i.e., the order of correlations, 
the representation methods, the structure transformation and random work on both 
of them. 


2.3.1 Low-Order Versus High-Order Correlations 


First, we define the interaction as a set I = [po, pi,--: , px-1] containing k 
basic elements of the system being studied, which can also be called vertices or 
nodes. Various real-world interactions can be described by such interactions, such 
as coauthors of a scientific paper, genes required to perform a specific function, 
neurons co-activating during a specific task, and more. We then denote the order 
(or dimension) of interactions among vertices as an order-0 interaction for a vertex 
interacting with itself only, an order-1 interaction for two vertices interacting 
with each other, an order-2 interaction for three vertices interactions, and so on. 
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Fig. 2.8 The expressive ability comparison of graph and hypergraph 


Furthermore, high-order interactions are considered k-interactions with k > 2. Low- 
order interactions, on the other hand, are those characterized by k < 1. 

Figure 2.8 shows the comparison of hypergraph and graph on the modeling 
of different orders of correlations. We notice that a graph can only represent the 
order-1 interactions between two vertices. Different from graph, a hypergraph 
can represent any order-k interactions through its flexible hyperedges. From this 
direction, hypergraph is more effective on modeling high-order correlation among 
subjects compared with graph. 


2.3.2 Adjacency Matrix Versus Incidence Matrix 

A graph with N vertices can be described by an adjacency matrix A € (0, 1)" *, 
where A; ; = 1 denotes that there is an edge connecting vertex v; and vertex vj. In 
most cases, the adjacency matrix A is a symmetry matrix. 

A hypergraph with N vertices and M hyperedges can be described by an 
incidence matrix H € (0, 1)" *", where H;,; = 1 denotes that the hyperedge e; 
connects vertex vj. 

By comparison of adjacency matrix and incidence matrix, a graph can be 
regarded as a 2-uniform hypergraph. In this case, each hyperedge can only connect 
two vertices. Given the possible N x N order-1 hyperedges H in the 2-uniform 
hypergraph, they can be directly projected to the N x N elements in adjacency 
matrix A. The hypergraph incidence and the simple graph adjacency matrix can be 
bi-transformed as follows: 


HH! =A +D. (2.6) 
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The adjacency matrix for graph and the incidence matrix for hypergraph have 
different processing styles when confronting multi-modal data or multiple types of 
connections. Given m adjacency matrices representing m graphs 4%, %,...,Gn,; 
there are two typical ways to combine these data for graph. The first way is to 
combine different graphs into one graph Y and then conduct other tasks. The second 
way is to conduct the task in each graph individually and then combine all these 
results. Figures 2.9 and 2.10 show these two types of methods. In either method, it 
is required to perform fusion, either in the graph structure part or in the result part. In 
recent years, a series of graph fusion methods [15, 16] have been introduced, while 
it is still a challenging task to optimally combine different graphs. On the other side, 
the multi-modal graph fusion is also with high computational complexity, which 
may limit the applications on multi-modal data. 

Different from the processing method in graph, hypergraph can handle such types 
of different connections in an easy and direct way, due to its flexible hyperedges. 
As shown in Fig.2.11, when there are multiple types of connections available, 
it is possible to generate multiple hyperedge groups with m incidence matrices 
Hı, H2, ..., Hm, and these m incidence matrices can be directly concatenated to 
generate the overall hypergraph structure H. In this way, all these multi-modal data 
or multiple types of connections can be easily modeled in one hypergraph and all 
further processing can be directly deployed on this hypergraph structure. Under such 
circumstances, it is not required to conduct multi-modal fusion in an explicit way, 
while it could be jointly included in the hypergraph computation process. 


2.3.3 Structure Transformation from Hypergraph to Graph 


A hypergraph can encode high-order data correlation (beyond pairwise) using its 
degree-free hyperedges compared to a simple graph, where the degree for all 
edges has to be 2. In a sense, a simple graph can be viewed as a special case, 
where all hyperedges on a hypergraph are of degree 2. Therefore, hypergraph and 
graph are interconvertible. Currently, there are a number of methods for converting 
a hypergraph to a simple graph. The common ones are clique expansion, star 
expansion, and line expansion, which are shown in Figs.2.12, 2.13 and 2.14, 
respectively. 


(1) Clique Expansion 

Figure 2.12 shows an example of transforming a hypergraph to a graph with clique 
expansion. The clique expansion algorithm constructs a graph Y (VY, E*) from 
the original hypergraph 4 (7^, £) by replacing each hyperedge e with edges, whose 
degree is 2, for each pair (u, v) of vertices in the hyperedge [17]: &* — ((u, v) : 
u,v Ee, ecg}. 


It is interesting to note that the vertices in hyperedge e form a clique in the graph 
4* . exactly where the name comes from. Y* preserves the structure of the vertices 
of Y, so that the information on the edges needs to be reduced as far as possible to 
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Fig. 2.9 Anexample of the graph structure fusion for the multi-modal data 
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Fig. 2.10 An example of the results fusion for the multi-modal data 


the higher order associations of the hyperedges. That is, the difference between the 
weights of any two edges that contains both u and v on 4* and the weights of the 
hyperedge connections should be as small as possible. Thus we use the following 
formula when assigning weights w" (u, v) to edges on 4*: 


w*(u,v) =argmin Y? (w*(u, v) - we)’. (2.7) 


w* (u,v) eeó:u,vee 

Hence, clique expansion uses the discriminative model, where every edge in the 
clique of Y* associated with hyperedge e has weight w(e). This criterion has the 
following minimizer: 


wu, =u M) we=uY hu, ehv, e)w(e), (2.8) 


eeó:u,vee 


where ju is a fixed scalar. Equivalently, from the point of view of edges, the weight 
between two vertices u and v is derived from the sum of the weights assigned by 
the hyperedge that contains all of them simultaneously. 


(2) Star Expansion 

Figure 2.13 shows an example of transforming a hypergraph to a graph with 
star expansion. By star expansion, a graph G* (7/*, &*) can be constructed from 
hypergraph 9 (V, £) by regarding every hyperedge e € & as a new vertex, thus 
v* = Y UE [17]. Each vertex in the hyperedge is connected to the new graph 
vertex e, i.e., &* = ((u, e) :u e e,e e G}. 
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Fig. 2.12 An example of transforming a hypergraph to a graph with clique expansion 


Hypergraph Graph 
^ Vi " Vi 
6 Star 6 
ey . 
Expansion 

Vs e2 v2 Vs v2 

e3 

LZ: v3 V4 v3 


@: Virtual Nodes frome, QÈ : Virtual Nodes frome, — (y : Virtual Nodes from e; 


Fig. 2.13 An example of transforming a hypergraph to a graph with star expansion 
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Fig. 2.14 An example of transforming a hypergraph to a graph with line expansion 


There are different types of vertices in graph Y* and each hyperedge in & 
corresponds to a star in graph G. With star expansion, the scaled hyperedge weight 
is assigned to each graph edge w* (u, e) that corresponds to each hyperedge in & as 
follows: 


w" (u, e) = w(e)/d(e). (2.9) 
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For each vertex representing a hyperedge, the weights of edges connecting to it are 
equivalent for equally dividing the superside weights into |5(e)| parts. 


(3) Line Expansion 

Figure 2.14 shows an example of transforming a hypergraph to a graph with line 
expansion. In the case of line expansion algorithm, the vertices of the graph Y = 
(r, él ) are constructed by reconstructing the structure of the data stored in the 
vertices of the hypergraph, 4 = (Y, &). Each line vertex (u, e) in 4! can be viewed 
as a vertex in a context of a hyperedge or a hyperedge in a context of a vertex [18]. 
For each point on each hyperedge, a vertex is created to represent it. The vertex v 
in the line expended graph indicates the property of the vertex in the hyperedge, to 
each vertex in the hyperedge to it, i.e., /* = {(u,e): u € e,u € Y,e € £}. This 
means that |»?!| = >>, 6(e). 

Therefore the vertexes in Y', which contain the same vertex or the same 
hyperedge, can be defined as the neighborhood. Consider both connections to 
be equally important, so W! = diag(1,..., D, |W!) = |Y?'| x |Y7|. The 
mapping between a hypergraph 4 and its line expansion 4/ is bijective under the 
construction. 


2.3.4 Random Walks on Graph and Hypergraph 


Random walks propagate the information stored in the vertices based on the links 
among the vertices in the graph or hypergraph. These links constitute the path of 
different vertices. In the hypergraph, each vertex's neighbor vertex messages are 
aggregated to update itself based on the “path” between the central vertex and 
each vertex in its neighborhood. A hypergraph’s path between vertices vı and v, 
is defined as a sequence, called hyperpath [19]: 


P (vi, uk) = (vi, €1, U2, €2, ... , Uk—1, Ck, Uk), (2.10) 


where v; and v;+1 are both part of the same vertex subset described by a hyperedge 
ej. We say that a hyperpath separates two neighboring vertices by a hyperedge. In a 
hypergraph, messages between vertices are propagated through hyperedges, which 
are higher-order relationships than those in graphs. It is first necessary to extend the 
Neighbor Relation definition among vertices to the Inter-Neighbor Relation N over 
vertex set VY and hyperedge set & for message propagation from vertex to hyperedge 
and hyperedge to hyperedge on the hyperpath. 


Definition 1 The Inter-Neighbor Relation N C Y x & on a hypergraph Y = 
(X^, &, W) with incidence matrix H € (0, 1}!”!*!¢! is defined as 


N = ((v, e) | H(v, e) 2 l, v e Y ande € &}. (2.11) 
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The hyperedge inter-neighbor set Ne(v) of vertex v and the vertex inter-neighbor 
set N,(e) of hyperedge e are defined based on the Inter-Neighbor Relation. 


Definition 2 The hyperedge inter-neighbor set of vertex v € V is defined as 
Ne(v) = (e| vNe, v e Y ande € £}. (2.12) 

Definition 3 The vertex inter-neighbor set of hyperedge e € & is defined as 
Ny,(e) 2 (v | vNe, v e Y ande e &}. (2.13) 


With hypergraph learning, in contrast to graph learning, data are correlated at 
a higher level, and correlation models are expanded to a high level, resulting in 
improved performance in practice. This is just an apparent part of the nature of 
graph and hypergraph. Next, we delve deeper into the relationship between graphs 
and hypergraph from the point of view of mathematical derivations with the help 
of random walks [20] and Markov chain [21]. We then provide a mathematical 
comparison between hypergraph and graph. The proof concludes that, from random 
walks' aspect, a hypergraph with edge-independent vertex weights is equivalent to 
a weighted graph, and a hypergraph with edge-dependent vertex weights cannot be 
reduced to a weighted graph. 

Two types of hypergraphs can be constructed to accurately represent real-world 
correlations, that is, hypergraph with vertex weights independent of edge and 
hypergraph with vertex weights dependent on edge. By using the binary hypergraph 
incidence matrix H € (0, 1}!”!*!l, where vertices in each hyperedge share the 
same weight, hypergraph with edge-independent vertex weights (Gn = (^, &, W}) 
can model beyond pairwise correlations. Alternatively, the weighted hypergraph 
incidence matrix R € R!”'*!¢! is used to model the variable correlation intensity 
in each hyperedge for the hypergraph with edge-dependent vertex weights (fje = 
UY, E,W, y )). We assume that hyperedge e includes vertex v, where y; (v) denotes 
the connection intensity and w(e) the weight of hyperedge e. 

In hypergraph with edge-independent vertex weights, the definition of binary 
hypergraph incidence matrix H, vertex degree d(v), and hyperedge degree (e) is 
the same as in Sect. 2.1. In hypergraph with edge-dependent vertex weights, define 
the d (v) and 6(e) as follows: 


d= wip) 
Bestel”) 2.14 
Boe 3 x um 
a € (e) 


where .%(-) and JJ; (-) are defined in Eqs. (2.12) and (2.13), respectively. 

Then, we will introduce the random walks and the Markov chain in hypergraph. 
First, we define the random walk in a hypergraph following papers [20—23]. At time 
t, a random walker at vertex v; does the following: 
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e Pick an edge e containing vertex v; = v, with probability py... 
* Pick vertex u from e, with probability pe... 
e Move to vertex vj..j = u, at time t + 1. 


We then define the transition probability p,,, of the corresponding Markov 
chain on 7^ as Pyu = Ð ec (vu) Pv>ePe>u, Where Ne(v, u) = Melv) N Nelu) 
denotes the hyperedge 6 € -%(v, u) containing vertices v and u, simultaneously. In 
hypergraph with edge-independent vertex weights, we have py+e = w(e)/d(v) 
and Pe+, = 1/ó(e). The transition probability p,,, can be written as py, = 
2 18e Azo) oe à oar In hypergraph with edge-dependent vertex weights, we have 
Pv>e = w(e)/d(v) and pe+y = Ye(u)/d(e), and the transition probability p,,,, can 


ypu) 
be written as py, = = P de dft D TU. POE 


The following lemmas and definitions are used to compare the graph and the two 
types of hypergraphs [21]. 


Definition 4 Let M be a Markov chain with state space X and transition probability 
Px,y, for x, y € S. It can be said that M is reversible if there exists a probability 
distribution z over S such that zt; px, = Ty py x. 


Lemma 5 Let M be an irreducible Markov chain with finite state space S and 
transition probability py y for x, y € S. M is reversible if and only if there exists a 
weighted undirected graph € with vertex set S such that random walks on € and M 
are equivalent. 


Proof of Lemma 5 Note that x indicates the stationary distribution [21, 24] of a 
given edge-independent/edge-dependent hypergraph. The transition probability py, 
of vertices in hypergraph with edge-independent vertex weights is defined as 


E TOVAR 
M? Go) Go): a 


pee (v,u) 


Moreover, the transition probability p,,, of vertices in hypergraph with edge- 
dependent vertex weights is defined as 


E w(B)) (yp) 
P? Galt! vu 


BeANe(u,u) 


“=>”: Suppose M is reversible with transition probability px, y. We then construct 

a graph 4 with vertex set S and edge weights w,,, = Mxpx,y. Because M is 

irreducible, zt,  O and py,, Æ O for all states x and y. Thus, the edge weight 

Wx,y # 0 and the graph Y are a connected graph. Due to the reversibility of M that 

Wx,y = Tx Px,y = My Py,x = Wy,x, the constructed graph 4 is an undirected graph. 

Random walks on 4 from x to y in one-time step satisfy the following: 
Wx,y Tx Px,y u Px,y 


(2.17) 


= = Px,y> 
2 sel Wx,z doses Tx Px,z paeem Px,z 
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since p cs Px,z = 1. Thus, if M is reversible, the stated claim holds. 
“<=”: Random walks on an undirected graph are always reversible. 


Definition 6 A Markov chain is reversible if and only if its transition probability 
satisfies 


Pvi,va Pv2,v3 ^^^ Punoi = Poi vn Pos vs ^77 Posi (2.18) 


for any finite sequence of states v1, v2, --- v, € S. The definition is also known as 
Kolmogorov's criterion. For more detailed proofs, please refer to [25]. 


Theorem 1 Let $n = {V , E, W} bea hypergraph with edge-independent weights, 
and then there exists a weighted undirected graph Y such that a random walk on G 
is equivalent to a random walk on Gin. 


Proof of Theorem 1 The probability Pyu of Gn is defined in Eq. (2.15). By 
Definition 6, the following equation can be deduced: 


Pv1,v2 Pv2,v3 tt Posi (2.19) 
3 E i. sr D E 
ECT d(vi) (B) deca d(v,) à() 
_{_! wb) [! w(B) 
don) sekan UP dlon) een) SO 
1 2 w(B) 1 3 w(B) 
d(v2) sekam O IUD gee, vy 909 


For any v; and vj, D dela d = Za petju) TO Thus, the reversibility 
can be proven by 


Pvi,v2 Pv2,v3 ^ ^^ Pun.vy (2.20) 
So ee a 
d(v2) Done 5(B) | d(v) TSPE 5(B) 


= Pw, vi Pv3,v2 ^^ Puis 
= Dvi,v, Pv, vs. iie Pv»,v * 
We say that a random walk on 4, is reversible. Furthermore, by Lemma 5, a random 


walk on Gn is equivalent to a random walk on a weighted undirected graph 4. 
The proof of Theorem 1 can be processed as follows: 


1. A random walk on Y, is equivalent to a random walk on a reversible Markov 
chain (according to Definition 6). 
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Fig. 2.15 An example of two types of random walks on the hypergraph with edge-independent 
vertex weights and the hypergraph with edge-dependent vertex weights. This figure is from [26] 


2. A random walk on a reversible Markov chain is equivalent to a random walk on 
a weighted undirected graph 4 (according to Lemma 5). 


Theorem 2 Let £j, = {¥,&,W,y} be a hypergraph with edge-dependent 
weights, and then there does not exist a weighted undirected graph Y such that 
a random walk on € is equivalent to a random walk on Gae. 


Proof of Theorem 2 Figure 2.15 provides an example that a random walk on Ge 
is not equivalent to a random walk on a reversible Markov chain. According to the 
second step of Theorem 1’s proof, Theorem 2 holds. 


A simple illustration is shown in Fig. 2.15 to make it easier to understand. There 
is no difference in the connection structure between the two hypergraphs, but there 
is a difference in the intensity of the connections. For two types of hypergraphs, the 
transition probabilities py,,, can be computed accordingly. As a consequence, two 
random walks from vertex vo are conducted: “vo — vy — v2 — vo" and “vg > 
v2 — v; — vo.” Having obtained py, v, * pv; v; * v;,vo ANA Dvo, v; * Pvr,v1 * Povi, vo fOr 
the two paths, the cumulative transition probability can then be calculated. This type 
of hypergraph is reversible according to Theorem 1 and Lemma 5. Thus, from the 
two reversible paths, the same accumulated transition probability can be obtained. 
Alternatively, two different accumulated transition probabilities are obtained from 
two reversible paths in the hypergraph with edge-independent vertex weights. 


2.4 Summary 


In this chapter, we present the mathematical definition of the foundations of 
hypergraph and their interpretation. We then also show the representation of directed 
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hypergraph, different from undirected hypergraph, which represents the relation- 
ships between vertices within a hyperedge. Finally, we discuss the relationship 
between graph and hypergraph in conversions and expressive ability perspectives. 
The most intuitive differences between graph and hypergraph can be seen in low- 
order versus high-order representations and adjacency matrix versus incidence 
matrix. Clique expansion, star expansion, and line expansion are methods for 
converting hypergraph into simple graph. We also show the relationship between 
graph and hypergraph from the random walk view. A hypergraph with edge- 
independent vertex weights is equivalent to a weighted graph, and a hypergraph 
with edge-dependent vertex weights cannot be reduced to a weighted graph from 
the information propagation process on graph/hypergraph. 
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Chapter 3 A) 
Hypergraph Computation Paradigms TCA 


Abstract This chapter introduces three hypergraph computation paradigms, 
including intra-hypergraph computation, inter-hypergraph computation, and 
hypergraph structure computation. Intra-hypergraph computation representation 
aims to conduct representation learning of a hypergraph, where each subject is 
represented by a hypergraph of its components. Inter-hypergraph computation is to 
conduct representation learning of vertices in the hypergraph, where each subject 
is a vertex in the hypergraph. Hypergraph structure computation is to conduct 
hypergraph structure prediction, which aims to find the connections among vertices. 
This chapter is a general introduction of hypergraph computation paradigms to 
show how to formulate the task in the hypergraph computation framework. 


3.1 Introduction 


Hypergraph computation can be roughly divided into three types: representation 
learning of a hypergraph, where each subject is represented by a hypergraph of 
its components, representation learning of vertices in the hypergraph, where each 
subject is a vertex in the hypergraph, and hypergraph structure prediction, which 
aims to find the connections among vertices. There are three types of computation 
paradigms that can be named intra-hypergraph computation, inter-hypergraph 
computation, and hypergraph structure computation. In this chapter, we introduce 
the generalized computation paradigms corresponding to these three directions and 
show how to formulate practical tasks in these hypergraph computation frameworks. 
We note that specific implementations of generalized functions in the paradigm are 
not introduced here, as they are parts of specifically defined functions or modules 
in the hypergraph computation framework and will be introduced in subsequent 
chapters. 


© The Author(s) 2023 41 
Q. Dai, Y. Gao, Hypergraph Computation, Artificial Intelligence: Foundations, 
Theory, and Algorithms, https://doi.org/10.1007/978-981-99-0185-2_3 


42 3 Hypergraph Computation Paradigms 
3.2 Intra-hypergraph Computation 


Intra-hypergraph computation targets on learning the representation of a single 
subject using the inside component information, in which the correlations among 
the components of this subject are formulated in a hypergraph. In this hypergraph, 
the components of this subject are regarded by the set of vertices, and their high- 
order correlations are modeled by hyperedges. In this way, the individual subject 
is transformed into a hypergraph. As this hypergraph is generated by the subject’s 
components themselves, we can name this hypergraph as the intra-hypergraph of 
this subject. 

Image representation and understanding [1—3] are typical intra-hypergraph com- 
putation applications. For example, an image can be split into a group of patches, 
and each patch is denoted by a vertex in the hypergraph. The hypergraph can be 
generated according to the semantic and spatial information of these patches. The 
information of these patches and their high-order correlations can be then used 
simultaneously to learn the representation for the image. 

The general paradigm of intra-hypergraph computation can be described as 
follows. Given a target subject that contains n components, that are represented 
by feature vectors X € R"*4. An intra-hypergraph 4 can be generated to formulate 
the high-order correlations inside the subject, whose incidence matrix is denoted by 
H. The representation of the individual subject can be learned by 


Zg = fo(H, X), (3.1) 


where © denotes the to-be-learned parameters. The function fø (-) can be the neural 
network layers or other computing operators that aggregate the information of 
vertices together based on the hypergraph structure. Intra-hypergraph computation 
integrates the complex correlations among components into the learned representa- 
tion, which can extract more information than simple aggregation operations. 

In this paradigm, the subject to be analyzed is regarded as a whole system, and 
the intra-hypergraph is to model the correlation inside the system. This process is 
shown in Fig. 3.1. 


3.3 Inter-hypergraph Computation 


Inter-hypergraph computation targets at learning the representation of a subject by 
considering its correlations with other subjects. In this hypergraph, each subject, 
including the target one, is regarded by the set of vertices, and their high-order 
correlations are modeled by hyperedges. In this way, this group of subjects is 
transformed into a hypergraph. As this hypergraph is generated by the cross-subject 
correlations, we can name this hypergraph as the inter-hypergraph of this subject. 
Subject classification and retrieval [4—7] are typical inter-hypergraph computation 
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applications. For example, we take an image as the target subject, and we can also 
have a pool of images for processing. Each image can be denoted by a vertex in the 
hypergraph. The hypergraph can be generated according to the semantic and spatial 
information of these images. The information of these images and their high-order 
correlations can be then used simultaneously to learn the representation of the target 
image. 

The general paradigm of inter-hypergraph computation can be described as 
follows. Given a target subject and other n — 1 subjects, represented by feature 
vectors X € IR"*4, an inter-hypergraph 4 can be generated to formulate the high- 
order correlations among these subjects, whose incidence matrix is denoted by H. 
The representation of the target subject can be learned by 


Zy = fo(H, X). (3.2) 


The vertex embedding can be further used for the downstream tasks, such as vertex 
classification, where the vertices are associated with pre-defined labels Y € [K]". 
This process is also shown in Fig. 3.1. 

It is noted that a hypergraph structure can be either homogeneous or hetero- 
geneous, depending on the definition of vertices. Given multiple types of data, 
or multi-modal data, another way to formulate such correlations is to generate 
multiple hypergraphs accordingly. For example, supposing that there are m types 
of features or modalities, denoted by Xj, X2,..., Xm, we can construct one 
hypergraph for each modality respectively. In this way, we can have m hypergraphs 
4 = 0:61; Wi); % = (A; 6; W2); 5; Gn = (Yn: En; Wm) for the data with 
m modalities. The general paradigm for multi-modal inter-hypergraph computation 
can be described as 


Zy = fo Hı, Ho, ..., Hm, Xi, X2, ..., Xm), (3.3) 


where H;, Ho, ..., Hm are the incidence matrices of the m hypergraphs. 


3.4 Hypergraph Structure Computation 


Hypergraph structure computation aims to learn the high-order correlations among 
data in the presence of missing links and inaccurate initial structure. There are 
two scenarios in which hypergraph structure computation is performed: either the 
set of hyperedges is incomplete or the affiliation relationships between vertices 
and hyperedges are incomplete. Recommender system and drug discovery [8— 
10] are typical hypergraph structure computation applications. For example, in 
recommender system, the hyperedges describe the connections between items and 
users with specific semantics. The number of hyperedges is fixed, and the features 
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of both vertices and hyperedges can be obtained as the input. Here, the target 
of hypergraph structure computation is to predict whether a vertex belongs to a 
hyperedge or not. If a new hyperedge is predicted, we can have new link to indicate 
the connections. However, in a knowledge hypergraph, the hyperedges display the 
facts in the real world, which are usually highly incomplete. The missing links are 
expected to be inferred based on existing links by hypergraph structure computation. 
Therefore, in the second case, the objective of hypergraph structure computation is 
not only optimizing existing links but also inferring the unobserved links. 

In the following, we describe the computation paradigms of these two cases 
separately. The first scenario is that the set of hyperedges is complete and the 
affiliation relationships between vertices and hyperedges are incomplete. In this 
case, we usually can extract a feature vector for each hyperedge representation. 
Given the input of vertex features Xy and hyperedge features X7, we can calculate 
the incidence matrix by the function related to the vertex and hyperedge features as 


H* = fo(Xy, Xe). (3.4) 


For example, the attention score can be used as an instance of the function in 
practice. 

In the second scenario, if there are missing hyperedges in the observed hyper- 
graph and the semantics of hyperedges are ambiguous, it is difficult to directly 
describe the hyperedges by features. Consequently, only the initial incomplete 
hypergraph structure and the features of vertices can be available as the input. 
We denote the incidence matrix of the initial hypergraph structure by H®. The 
computation paradigm can be written as 


H* = fo(Xy,H), (3.5) 


which indicates that the new hypergraph structure is updated based on the original 
hypergraph structure following specific prior information. 

To guide the evolution of hypergraph structure to more accurately model data 
correlation, it is necessary to evaluate the quality of hypergraph structure based on 
the training data and prior information. If there is part of ground truth information 
about the hypergraph structure, the performance of correlation modeling can be 
evaluated directly. However, there is no golden standard for hypergraph structure 
in most cases. Therefore, we may need to perform downstream tasks using the new 
hypergraph and indirectly evaluate hypergraph computation performance through 
the downstream task results. Here, we refer to Fig.3.1, and hypergraph structure 
computation can be conducted under the intra- and inter-hypergraph computation 
frameworks. 
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3.5 Summary 


In this chapter, we introduce three hypergraph computation paradigms for dif- 
ferent scenarios. These three paradigms are intra-hypergraph computation, inter- 
hypergraph computation, and hypergraph structure computation, which focus on 
learning the representation of a single subject using the inside component informa- 
tion, learning the representation of a subject by considering its correlations with 
other subjects, and learning the high-order correlations among data in the presence 
of missing links and inaccurate initial structure. This chapter provides an overview 
of how to use hypergraph computation, and the detailed hypergraph computation 
theory, methods, and application will be introduced in the following chapters. 
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Chapter 4 A) 
Hypergraph Modeling erts 


Abstract Hypergraph modeling is the fundamental task in hypergraph computa- 
tion, which targets on establishing a high-quality hypergraph structure to accurately 
formulate the high-order correlation among data. In this section, we introduce 
different hypergraph modeling methods to show how to build hypergraphs using 
various pieces of information, such as features, attributes, and/or graphs. These 
methods are organized into two broad categories, depending on whether these 
correlations are explicit or implicit, to distinguish the similarities and differences. 
We then further discuss different hypergraph structure optimization and generation 
methods, such as adaptive hypergraph modeling, generative hypergraph modeling, 
and knowledge hypergraph generation. 


4.1 Introduction 


Although there are complex correlations among data in many applications, it is 
difficult to discover such complex correlations in many cases due to the limitations 
of observation technologies. Taking social networks as an example, the group 
information is a kind of high-order correlation that connects a number of people 
based on certain criteria. However, it is intractable to investigate all the groups 
when there are millions or even billions of vertices in a social network. Another 
typical example is the human brain network. Apparently, some functions of the brain 
are implemented by the communications among multiple brain regions rather than 
just two regions, which means that there exist high-order correlations among brain 
regions. Nevertheless, much manpower and material resources would be required 
to directly record such high-order correlations by neuroscience experiments. There- 
fore, it is necessary to study how to model such high-order correlations based on 
existing information in practical applications. 

Hypergraph has shown its superiority on high-order correlation modeling. 
Hypergraph structure generation has attracted much attention and is still an open 
problem due to complex correlations among non-standard data. In this chapter, we 
systematically review the existing hypergraph modeling methods, including both 
the implicit hypergraph modeling strategy and the explicit hypergraph modeling 
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Fig. 4.1 Different categories of hypergraph modeling methods 


strategy. The implicit hypergraph modeling strategy aims to generate the hypergraph 
structure using vertex representations based on their distances or similarities, in 
which the correlations are not directly provided. The explicit hypergraph modeling 
strategy targets at the data with explicit high-order correlation information, such as 
attributes and pairwise connections. For the implicit hypergraph modeling strategy, 
we mainly introduce the distance-based and representation-based hypergraph struc- 
ture generation methods. For the explicit hypergraph modeling strategy, we focus 
on the attribute-based and the network-based hypergraph generation approaches. 
Figure 4.1 illustrates the hypergraph modeling methods. 

We further give four examples in computer vision, recommender system, 
computer-aided diagnosis, and brain network for hypergraph modeling in this 
chapter. In the last part, we discuss the topics of further research of hypergraph 
modeling, which have the potential of going beyond the limitations of current 
methods that are difficult to be adaptive to complex data. Part of the work introduced 
in this chapter has been published in [1—4]. 


4.2 Implicit Hypergraph Modeling 


In implicit hypergraph modeling, the correlations among data are not directly pro- 
vided. Under such circumstances, we need to explore different representations of the 
data to build the correlations. Two typical methods for implicit hypergraph modeling 
are distance-based methods and representation-based methods. In distance-based 
methods, we can explore the neighborhood information for each sample in some 
specific feature spaces, and the samples with high similarity/low distance can be 
connected by a corresponding hyperedge. In representation-based methods, the 
representation among different feature vectors for the samples is used to measure 
the neighborhood information, which can be used to generate hyperedges. 
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4.2.1 Distance-Based Hypergraph Generation 


Distance-based hypergraph generation methods construct the hyperedges based on 
the distances in the feature space for all the vertices. In general, the construction of 
the hypergraph can be divided into two steps: the incidence matrix generation and 
the hyperedge weight generation. For the incidence matrix generation, the connec- 
tivity on the hypergraph, i.e., the hyperedge, is determined with the consideration 
of the neighborhood relationships, where the neighbors of the vertices in the feature 
space are connected by these hyperedges. For the hyperedge weight generation, the 
weights of these hyperedges are calculated based on the distance information. 

The incidence matrix is generated based on the neighbors of the vertices. There 
are two major approaches to determine the neighbors [1], i.e., the nearest-neighbor- 
based hyperedge generation strategy (shown in Fig. 4.2) and the clustering-based 
hyperedge strategy (shown in Fig.4.3). The nearest-neighbor-based hyperedge 
generation strategy searches the nearest vertices for the given vertex, ie., the 
centroid, and connects these vertices by the hyperedges. The clustering-based 
hyperedge generation strategy groups the vertices with the features and constructs a 
hyperedge to connect all vertices fallen into the same cluster. 

The nearest-neighbor-based hyperedge generation strategy starts out with calcu- 
lating the distances between all pairs of vertices in the feature space. Subsequently, 
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Fig. 4.2 An illustration of the nearest-neighbor-based hyperedge generation strategy. (a) shows 
the k-NN neighbors of the given vertex, and (b) shows the e-ball neighbors 


Fig. 4.3 Illustration of the cluster-based hyperedge generation strategy. This figure is from [1] 
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two commonly used criteria [5] are applied to determine the neighbors of the 
given centroid, i.e., the k-NN neighbors [6] and the e-ball neighbors [2]. The given 
centroid and the selected neighbors are connected together by a hyperedge. 

Here we denote ¥ as the vertices set, u € V as the given centroid, X (u) as the 
feature vector of u, d(x1, x2) = ||x1 — x2||2 as the Euclidean distance between the 
vectors x; and x2, .%(u) as the kK-NN neighbors set of u, and %(u) as the e-ball 
neighbors set of u. %(u) contains k vertices with the smallest distance to u, while 
^ (u) contains the vertices with distance smaller than e, i.e., 


MNe(u) = (old (X (u), X(v)) < €}. (4.1) 


The vertex u and the neighbors “~ (u) (either JJ (u) or 4% (u)) are grouped together 
to generate a hyperedge e(u): 


e(u) = N (u) U {u}, (4.2) 
and the hyperedge set & is formulated as 
& = {e(u)|u e Y). (4.3) 


The clustering-based hyperedge generation strategy starts out with grouping the 
vertices according to the corresponding features using the clustering algorithms, 
such like k-means. Subsequently, the vertices belonging to the same cluster are 
connected together using a hyperedge. Here we assume that the k-means algorithm 
clusters the vertex set Y into K groups %,..., Yk. Then, K hyperedges can be 
constructed using these clustering results: 


VI xk x K, ek = % = (vi Ve, ---}, (4.4) 
and the hyperedge set & is formulated as 
E = {e.|V1 < k < K}. (4.5) 


Besides the similarity/distance in the feature space, other types of information, 
which can be used to measure the correlation in some specific space, such as the 
spatial information, can also be applied for hyperedge generation. For example, 
the spatial information of pixels in an image can be used to select a group of 
neighbor pixels for one centroid, which can be connected by a hyperedge, as shown 
in Fig. 4.4. 
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Fig. 4.4 An illustration of using spatial information of pixels to generate a hyperedge 


Typically, an incidence matrix H is used to represent the structure of the 
hypergraph, i.e., 


u.- | UK (4.6) 
O0 otherwise 
where u € Y and e € £. 

The weight matrix of the hypergraph represents the importance of each hyper- 
edge. A commonly used method for the hyperedge weight measurement is based on 
the Gaussian kernel, where the scores of each pair of vertices belonging to the same 
hyperedge are calculated using the distance between the vertices in the pair and the 
average score can be used as the weight of the hyperedge, i.e., 


uie oes (soe). (4.7) 


[03 
u,vce 


where w(e) denotes the weight of hyperedge e, and o is the band width of the 
Gaussian kernel. 

In this way, if the vertices connected by a hyperedge are with relatively higher 
similarity, the corresponding hyperedge weight could be larger and vice versa. Then, 
the hyperedge weights can represent whether this hyperedge is trustable for further 
processing. 

In practice, ø can be set as the median value of the distances among all vertices 
by 


o = median, yeyd (X (u), X (v)), (4.8) 
where median denotes the median value. It is noted that the hyperedge weight can 


be set in other ways following the purpose of evaluating the importance of each 
hyperedge. 
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The main limitation of the distance-based hypergraph generation method is the 
inaccurate distances due to noise and outliers of data, which may further introduce 
noise to the structure of hypergraphs. In practice, the feature representation for the 
data is still a challenging task. It is not easy to conduct effective feature extraction 
under certain application scenario. The metric for distance calculation also matters. 
Although the Euclidean distance is commonly used, there still exist some other 
metrics, such as the L;-norm and the negative cosine distance. The decision making 
of these metrics requires experimental evaluation. Therefore, the distance-based 
hypergraph generation method may suffer under such circumstances. 

The nearest-neighbor-based hyperedge generation strategy is the most simple 
one to be deployed in practice. The limitations of this strategy are as follows. 
First, the hyperparameter, i.e., k for the k-NN neighbors and e for the e-ball 
neighbors, may significantly affect the structure of the hypergraph and further 
influence the performance of hypergraph learning. Unfortunately, there are still no 
general principles for the selection of k and e, and the adaptive justification of 
these hyperparameters is not trivial in practice. Second, the calculation of the k-NN 
neighbors is expensive for large scaled data in both time and memory. 

Regarding the clustering-based hyperedge generation strategy, there is no com- 
mon way to determine how many clusters should the vertex set be divided into, as the 
scale of the clustering results also affects the structure of the hypergraph. A possible 
solution is to conduct clustering multiple times in different scales, which generates 
multiple hypergraphs with different k values and then composes these hypergraphs 
together for multilevel representation. 


4.2.2 Representation-Based Hypergraph Generation 


As introduced above, the distance-based hypergraph generation has some disadvan- 
tages. For the KNN hypergraph, the hypergraph, which connects the centroid sample 
and its k nearest samples, is uniform. Its structure may not be sufficiently adaptive. 
Also, the distance-based hypergraph is sensitive to noise. To solve this problem, the 
hypergraph can be generated by the representation-based methods. 

Different from the distance-based methods, which generate hyperedges through 
some metrics in the feature space, the relations among the vertices in representation- 
based methods are from the feature reconstruction, as shown in Fig.4.5. In 
reconstruction, different strategies have different generation effects. Here we intro- 
duce three representation-based main branches to construct hypergraphs, i.e., 
I, -hypergraph [7], /;-hypergraph [8], and the combination of them both. The details 
of these methods are described as follows. 


(1) /;-Hypergraph Generation 

For the /;-hypergraph construction, as introduced in [7], sparse representation 
method can be used to formulate the relation between the hyperedge and its vertices, 
and the sparse representation is embodied in the coefficients that linearly combine 
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Fig. 4.5 An illustration of the representation-based methods 


the basic vectors to reconstruct the input vector. In the hyperedge construction, the 
centroid vertex is reconstructed by the other vertices in the same hyperedge. We use 
the coefficients to present the incidence matrix of hypergraph. Mathematically, we 
denote the centroid vertex in the /; -hypergraph by ve, and it can be represented as 


arg min||Bz — X(ve)|I3 + y lizlhi. 
(4.9) 
s.t. Vi, zi > 0, 


where X(v,.) denotes the feature vector of the centroid vertex, B denotes the feature 
of its k nearest vertices, and z; is the reconstruction coefficient vector. The first term 
in Eq. (4.9) is the reconstruction term that makes a good representation of input 
vector X(v.) with the basic vectors B. The second term is the /,-regularization, 
which forces the coefficient z to sparse. y is a hyperparameter that balances the 
influences of the two terms. The constraint z; > 0 makes the reconstruction 
coefficients non-negative. Note that each sample may act as a centroid vertex 
to generate a hyperedge. For the dataset containing n samples, the optimization 
problem is solved for n times. The non-zero reconstruction coefficients in the 
representation can be seen as the connection weights of the neighborhood vertices in 
the hyperedge, and the neighborhood vertices with zero reconstruction coefficients 
are outside of the hyperedge. The connection weight between the hyperedge and the 
neighborhood vertices can be set as the vector of coefficients z;. The incident matrix 
H of this hypergraph is defined as 


j a 
z ifv; €@; 

j i 
H(vj;,e;) = 4 | 


, (4.10) 
O0 otherwise 


where e; is generated with the centroid vertex vj, and zi is the jth element of 
representation coefficients z. 
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(2) Elastic-Hypergraph Generation 

The /,-regularization in /;-hypergraph can generate sparse and effective hyper- 
graphs, despite that fact that it is hard to reveal the grouping information of samples. 
To enhance the effect of grouping, the elastic net [9] is introduced to combine an 
l2-norm penalty with the /;-norm constraint. The objective function of elastic net 
can be formulated as 


arg min||Bz — X (vo) |l + yllzlli + Bllzll3, 
£ (4.11) 
s.t. Vi.z; > 0. 


The elastic net can create a hyperedge whose weight can be determined by the 
reconstruction coefficients by using both the /?-norm and the /;-norm penalties to 
group more relevant and important neighbors. 


(3) I5-Hypergraph Generation 

Note that there are two drawbacks of the above two representation-based 
approaches: (1) They use a /;-norm-based metric to measure the reconstruction 
errors, which makes them still sensitive to sparse reconstruction errors. (2) Since 
these methods create hyperedges by linearization, they are unable to handle 
nonlinear data. By eliminating the sparse noise component from the original data, 
integrating the locality, and maintaining the constraint to the linear regression 
framework, the /?-hypergraph [9] is created to address these issues as 


: yı y2 
arg min|X — XC — EJ? + 7 lICII + 7-Q O CIF + BIEI, 
z (4.12) 
s.t. C71 — 1, Diag(C) — 0, 


where © stands for element-wise multiplication, C is the coefficient matrix, E is 
the data error matrix, and Q is the locality adapter matrix used to retain the local 
manifold structures. Hyperedges can then be created using the coefficient matrix C. 


The ability of each vertex being able to be reconstructed in the feature space can 
be evaluated via representation-based hyperedges. It is possible to calculate and use 
the correlation between the feature vectors to create connections among the vertices. 
Similar to the distance-based methods, this field of study may encounter the issue 
of data noise and outliers. Another drawback of this type of hypergraph generation 
methods is that only a portion of the relevant samples is chosen for reconstruction 
during the computing process, and the resulting hyperedge may not be able to 
accurately capture the data correlation through the complete data distribution. 
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4.3 Explicit Hypergraph Modeling 


Different from implicit hypergraph modeling, in some cases, there are existing 
connections among data. Explicit hypergraph modeling focuses on these scenarios 
and generates hypergraph structure using attribute information or networks. 


4.3.1 Attribute-Based Hypergraph Generation 


The data in real world may be associated with attributes in many cases. For example, 
the users in social network could have profiles, such as gender, age, and interests. 
The visual objects in images could have different characteristics, such as color, 
shape, and texture. Given the data assigned with different attributes, attribute-based 
hypergraph generation methods can be adopted to construct the hypergraph based 
on the attribute information, which provides an explicit way to encode semantic 
properties and diffuse knowledge [10]. As such a construction schema leverages 
the apparent correlations among objects directly, it can be categorized as explicit 
hyperedge methods. 

To generate a hypergraph using attributes, the following steps are needed: the 
hypergraph structure construction and the hyperedge weight assignation. The first 
step is to generate the vertex set Y and hyperedge set & based on the attribute 
information, and the second step is to assign different weights to the hyperedges 
and acquire the weight matrix W. 

When constructing the hypergraph from the attribute data, the samples to be 
explored are first modeled as vertices in a hypergraph, denoted as the vertex set 
VY. The same attribute shared by different vertices effectively indicates that these 
samples share common characteristics, which may be an objective tag or a subjective 
evaluation. Therefore, each attribute can be regarded as the semantic information on 
a connection, i.e., a hyperedge. In attribute-based hypergraph generation methods, 
a group of hyperedges (called a hyperedge group) are generated by linking all 
the vertices associated with the attribute space. It is obvious that the number of 
hyperedges equals to the number of attributes in this way. Such a hyperedge group 
generated based on the attribute information is denoted by 


Exusibare = {Naw(a) | a € a], (4.13) 


where Nat(a) is the subset of vertex set Y sharing the attribute a, and © is a set 
containing all defined attributes. Sometimes the attribute could be hierarchical, e.g., 
the car within the vehicles. In this case, the «/ and Gattribute can be extended to 
involve the subtypes of the attributes. 

Here we give one simple example to show how to construct the hypergraph 
structure using the attribute information, as shown in Fig.4.6. Given a social 
network data with user profiles, the users in the social network are first modeled as 
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Fig. 4.6 An illustration of the attribute-based hyperedge generation method 


vertices 7". The user profiles contain the objective reality such as gender and age as 
well as the subjective characteristics such as interests and knowledge, both of which 
can be adopted to generate the hyperedge groups. For example, we can have e female 
hyperedge connecting all female users and esports linking users who like sports. 
Additionally, as discussed above, sometimes the attributes are hierarchical. Under 
such circumstances, we can generate hyperedges with different levels to characterize 
multiple-scale attribute connections. For instance, we have users A, B, C, and D who 
all like sports, among them both users A and B like playing basketball, and users 
C and D like playing tennis. In this case, we first generate esports connecting users 
A, B, C, and D, and then epasketball and erennis are generated to link A, B and C, D, 
respectively. The hyperedge set in this example can be written as 


é le female; esports, €basketball » €tennis-]- 


The hyperedge weights are also important here. For attribute-based hypergraph, 
the number of shared attributes among the samples connected by the hyperedge 
can quantitatively reflect the relative correlation strength. Specifically, the more 
the attributes that the samples share, the stronger connections exist among these 
corresponding vertices, and the bigger weight that the hyperedges are assigned. 
Here each hyperedge e here can be seen as a clique. The mean of the heat kernel 
weights w(e) of the pairwise edges in this clique is considered as the corresponding 
hyperedge weight: 


1 
"Ó 7 gei; —1) 2. *'? 


u,vce 


2 
( IX) aen) | (4.14) 
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where ó(e) indicates the degree of hyperedge e, and X(u) and X(v) denote the 
feature vectors of vertices u and v, respectively. 

The attribute-based hypergraph generation method can capture the semantic 
properties apart from the structural information conveyed by the hypergraph 
structures themselves. The attributes serve as a type of intermediate-level feature 
representation of vertices and can provide another description for vertices beyond 
the low-level representations. However, the attributes are not available all the time. 
When there is no natural attribute descriptor for vertices, some extra solutions 
need to be applied to conduct attribute-based hypergraph generation. One possible 
solution is to manually design attribute tags, which may be both cumbersome and 
time-consuming. The other alternative is extracting attribute information from the 
raw low-level features by machine learning models [11]. Such a schema is more 
time-saving than manual definition, whereas the results rely heavily on the accuracy 
of the machine learning model. We also note that the attributes can be nameable, 
which indicates the semantic information can be directly understood, while they 
can also be non-nameable, which means the semantic information is not explicit. 


4.3.2 Network-Based Hypergraph Generation 


There are many applications of network data, including social networks [12], 
reaction networks [13], cellular networks [13], and human brain networks [3]. It 
is possible to generate subject correlations using the network information for these 
data. In a typical work of social media analysis [14], the vertices on hypergraph 
represent users and images. In addition to visual-textual relationships among 
images, hyperedges can be used to capture social links between users and images, 
also called homogeneous and heterogeneous hyperedges. The nearest-neighbor- 
based and attribute-based hyperedge generation methods are used to construct 
homogeneous hyperedges representing the visual and textual relations among 
images. Users and images are connected through social link relations to construct 
heterogeneous hyperedges. For example, both friendship and mobility information 
in location-based social networks can be used to generate hypergraphs using 
[12]. As a result, friendship hyperedges are generated within the social domain, 
and check-in hyperedges are generated across the social, semantic, temporal, and 
spatial domains. A protein-protein interaction network is naturally represented by a 
hypergraph [15], whose subsets (hyperedges) can be represented by tandem affinity 
purification (TAP) data. 

Aside from the first-order correlation, high-order correlations, e.g., the second- 
and third-order correlations, within the network can also be used as a means for 
generating hyperedges. A center vertex can be connected with its first-order and 
high-order neighbors (i.e., vertices whose shortest path to the centroid is greater 
than 1) through a hyperedge. A vertex's low-order neighbors need only to be 
considered if attention is focused on its local connection in the network. As an 
example, users who have similar preferences on items are able to be connected 
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Fig. 4.7 An illustration of the network-based hyperedge generation method 


within the recommendation network [4] according to first-order and second-order 
correlations, which will be used in order to generate a hypergraph as well as to 
perform collaborative filtering for the recommendation. Alternatively, if information 
of a vertex travels a long distance in the network, higher-order correlation is required 
to generate hyperedges. 

We then introduce two typical approaches to construct hyperedges from net- 
work/graph structure, i.e., pair-based and k-hop-based. Figure 4.7 illustrates the 
profile of these two approaches. In this example, % = (%, 6.) represents the graph 
structure with v; € 7, representing a vertex and es, € é, representing an edge 
connecting v; and vj. We let A indicate the adjacency matrix of Y,. As a result of 
such a graph structure, two types of hyperedge groups can be generated (Fig. 4.7). 


(1) Pair-Based Hyperedge Generation Strategy 

The ópai; is adopted to indicate the hyperedges constructed by pair correlations in 
the network/graph. Gpair targets at directly transforming the graph structure into a 
group of 2-uniform hyperedges, which can be formulated as follows: 


£i = {{vi, vM i v) e & E. (4.15) 


As a result, pair covers the low-order (pairwise) correlations in the graph 
structure, which is the basic information for high-order correlation modeling. 


(2) k-Hop-Based Hyperedge Generation Strategy 

Ehop İs adopted to indicate the hyperedges constructed by the k-hop neighbors in the 
network/graph. First, we define the k-hop neighborhoods of a vertex v in graph % 
as follows: 


Nop, (v) = {u | AE, #0, u € %}. 
Based on the k-hop’s reachable positions in the graph structure, ónop aims to 


find the related vertices for a central vertex. The range of the values of k is [2, ny], 
where n, refers to the number of vertices in %. The following is an example of a 
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hyperedge group Ghop with k-hop: 


£p, = [Nip (2 |v € Y]. (4.16) 


The hyperedge generated by Ghop can be exploited by extending the search radius 
to the external vertices, which also leads to groups of vertices rather than just 
two vertices, as opposed to two vertices only in the graph structure. As compared 
with just the pairwise correlation in pair, it can provide more information about 
correlations. 

Here, we discuss the advantages and limitations of the two types of hyperedges 
using network data, respectively. As far as the pair-based construction is concerned, 
clearly this type of hyperedge can only model low-order correlations, which cannot 
naturally explore high-order correlations in some scenarios. In contrast, hyperedges 
generated from the k-hop-based methods have the high-order information built-in of 
the original network. However, the high-order information in this type of hyperedges 
may be redundant and ambiguous. This is because the connection details in the k- 
hop-based hyperedges may be lost, which means that you cannot reconstruct the 
original network/graph from this type of hyperedge. Additionally, the k-hop-based 
hyperedges may lead to irreversible over-smoothing in each hyperedge, which is 
caused by the k-hop neighbors with exponential growth as k grows. 


4.4 Typical Examples of Hypergraph Modeling 


Here we give several examples of hypergraph modeling in real applications, 
including computer vision, recommender systems, computer-aided diagnosis, and 
brain networks, to demonstrate how to construct hypergraphs from data. 


4.4.1 Computer Vision 


Computer vision has attracted much attention in recent decades. In computer vision, 
there are multi-modal data, such as images, point clouds, etc. Both low-level vision 
tasks and high-level vision tasks have been deeply investigated. In these tasks, an 
important but challenging issue is the complex data correlation behind the vision 
data. For example, from the aspect of images, the pixels or patches are the elements 
of an image, while the semantic information for the image is represented by these 
pixels or patches. Terrence Joseph Sejnowski [16] mentioned that “In a task such 
as face recognition, in which important information may be contained in the high- 
order relationships among pixels, it seems reasonable to expect that better basis 
images may be found by methods sensitive to these high-order statistics.” Similar 
situations occur when facing multi-modal 3D object representation. Usually, a 3D 
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object can be represented by different ways, such as one single image, multi- 
view, point clouds, voxel, and mesh. Under such circumstances, the correlation 
among these objects becomes even more complicated. To model such high-order 
relationship among pixels/patches in one image, or among different 3D objects, 
simple graph is not capable to conduct this task. 

We first look into the high-order correlation modeling for an image. A 2D image 
is composed of a set of pixels, and each pixel owns a feature vector (channels). To 
generate a hypergraph to model the correlation behind this image, we can take each 
patch in the image as a vertex in the hypergraph, and the objective is to generate 
a group of hyperedges to connect these vertices (patches). Here we can employ 
the distance-based hypergraph generation method, in which each patch is selected 
as the centroid, and its nearest neighbors in the feature space are connected by a 
hyperedge. This process is shown in Fig.4.8. Furthermore, we can also employ 
the spatial information to build connection among these patches. The patches 
with closed spatial locations in the image could be connected with a hyperedge. 
Figure 4.9 shows an example of hypergraph modeling for image patches using 
spatial information. 

For 3D visual objects, there are complex correlations among them. For example, 
different furniture, such as tables and chairs, have legs, and different vehicles, such 
as cars and bicycles, have wheels. Another challenging issue comes from the multi- 
modality aspect. Given different modal data of 3D objects, the correlations are 
composed of inter-modal correlations and the cross-modal correlations, as shown 
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Fig. 4.8 An example of hypergraph modeling for image patches using feature information 
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Fig. 4.9 An example of hypergraph modeling for image patches using spatial information 
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Fig. 4.10 The complex correlations among multi-modal 3D objects 


in Fig. 4.10. Given a large number of 3D objects, it is difficult to accurately and 
completely manually describe all these correlations. 

In order to efficiently build a hypergraph structure, we usually extract the features 
of 3D objects and then build implicit hypergraphs. 3D objects can be described by 
multiple modalities, including point clouds, views, grids, and voxels. We can extract 
the descriptors of their respective modalities through the corresponding deep neural 
networks, such as dynamic graph CNN (DGCNN) [17] and PointNet (PointNet) [18] 
for point cloud data, multi-view convolutional neural networks (MVCNN) [19], and 
group-view convolutional neural networks (GVCNN) [20] for the multi-view data. 
When multi-modal features have been obtained, we can build a hypergraph structure 
for each kind of features. 

Here, each 3D object can be represented by a vertex in the hypergraph. Each 
time, one object is selected as the centroid in a feature space, and its nearest 
neighbors can be connected by a corresponding hyperedge. This process is repeated 
until all objects have been selected as the centroid once in this feature space. 
Every feature and possible feature combination can be used in this process. In 
this way, we can achieve multiple hypergraphs, represented by incidence matrices 
Hı, H2,...,H,, to formulate their correlations under different modalities. The 
pipeline is demonstrated in Fig. 4.11. We can further concatenate these incidence 
matrices along the axes of hyperedges to integrate these hypergraphs and obtain the 
complete hypergraph structure, as shown in Fig. 4.12. 


4.4.2 Recommender System 


In a recommender system, the relationship between users and items can be 
represented by a bipartite graph, that is, if an item is in a user’s recommendation 
list, then we connect the user vertex with the item vertex. This bipartite graph can 
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Fig. 4.11 An example of hypergraph modeling for multi-modal 3D objects 
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Fig. 4.12. An illustration of multi-hypergraph combination. This figure is from [5] 
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Fig. 4.13 An example of hypergraph modeling for a recommender system 


be simply transformed into a hypergraph, where vertices on one side remain and 
vertices on the other side become hyperedges, as shown in Fig. 4.13. In this way, 
each user can be represented as a vertex in the hypergraph, and the users shared 
the same items can be connected by a corresponding hyperedge here. If the item 
is regarded as a vertex, then the hyperedges are generated using shared users. This 
hypergraph generation procedure follows the attribute-based strategy. 

Mathematically, the ranking matrix of the recommender system equals to the 
incidence matrix of the corresponding hypergraph. With this transformation, we 
can solve the problem in recommender systems via hypergraph learning methods. 
In fact, undirected bipartite graph modeling and hypergraph modeling are inter- 
changeable in some cases. If the edges in bipartite graph are weighted, we can use 
the hyperedge-dependent vertex weights accordingly. 


4.4.3 Computer-Aided Diagnosis 


In computer-aided diagnosis, the main objective is to measure whether a coming 
patient has some specific disease or not, or how serious the disease it is. For 
diagnosis, the experience and knowledge are from previous medical records. Case- 
based diagnosis has shown importance in practice. For Al-based computer-aided 
diagnosis, it is important to explore the existing labeled training data, which could 
be very few in some cases. These medical records may contain different examine 
files, MR images, CT images, and other types of data. 

A conventional pipeline for computer-aided diagnosis is first extracting features 
from clinical text or medical imaging data and then applying computer programs 
to automatically categorize healthy people and patients. The commonly used 
techniques involve natural language processing, medical imaging analysis, machine 
learning, etc. It is worth noting that the existing methods mostly focus on individual 
subject classification. Under such circumstances, how to model the correlation 
among these subjects, including the training data and the coming patient (the testing 
data), is an important but difficult task. 
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Fig. 4.14 An example of hypergraph modeling for computer-aided diagnosis 


Here, a hypergraph at the subject level, i.e., each vertex stands for a subject, 
can be generated, where the hyperedges can be created using the distance-based 
method or attribute-based method. Given the MR images or other medical data, the 
features can be used to measure the distance between each two subjects. Then, the 
k-NN scheme can be used to select nearest neighbors for a centroid vertex and then 
generate a corresponding hyperedge, as shown in Fig. 4.14. 

Another type of applications is to model the inter-correlation in one medical 
image, such as gigapixel whole-slide histopathological images (WSIs). Survival 
prediction is an important task in medical image analysis, which targets on modeling 
the life duration of a patient using WSIs. Different from traditional images, WSIs 
are with very large size and rich details. Therefore, traditional image representation 
methods do not work well in this task. To formulate the inter-correlation inside a 
WSI, a hypergraph can be generated, which the patch correlations are generated. 
A group of patches can be sampled from the original WSI, such as 2000 or 
8000 patches. Then, these patches are represented as vertices in the corresponding 
hypergraph. The hyperedges can be generated based on either the visual feature of 
these patches or the spatial information, or both of them, using the distance-based 
hypergraph generation methods. 


4.4.4 Brain Network 


Recently, the development of neuroimaging techniques has provided a way to 
understand the brain network on a large scale. Studies have shown that the 
interaction relationships in the brain, from neuronal information flow to whole- 
brain functional networks, are the basis of its functionality. Therefore, formulating 
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Fig. 4.15 An example of hypergraph modeling for brain network 


the brain as a complex network and decoding its signals may further deepen 
our understanding of the human cognitive processes. The conventional functional 
network is usually modeled and represented based on pairwise correlations between 
two brain regions. However, neurologically, a brain region predominantly interacts 
with one or more other brain regions. 

When using hypergraphs to model a single brain network, the vertices denote 
brain regions, and the hyperedges represent the interactions among multiple regions. 
Each element in the incidence matrix corresponds to the contribution of the brain 
region to the specific function, as shown in Fig. 4.15. In this process, each region 
can be selected as the centroid, and its nearest neighbor regions in the feature space 
can be selected and connected by a corresponding hyperedge. 


4.5 Hypergraph Modeling in Next Stage 


In this part, we discuss future research topics of hypergraphs modeling to render 
them more accurate and flexible, including adaptive hypergraph modeling, genera- 
tive hypergraph modeling, and knowledge hypergraph generation. 


4.5.1 Adaptive Hypergraph Modeling 


Having initialized the hypergraph structure, the structure is fixed during the learning 
process. However, the initial hypergraph structure constructed by existing hyper- 
graph modeling methods contains many noisy connections that may be destructive 
for the learning process. Therefore, the original structure needs to be optimized 
according to the data and downstream tasks to cut down on structure noise. Although 
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there are some existing work on hypergraph structure optimization, these efforts are 
still far from reaching the goal of accurately modeling of complex data correlations. 

At this stage, the selection of hypergraph generation methods still depends on 
experience, rather than a theoretical strategy. A possible route to conduct automate 
hypergraph generation is to create various hypergraphs via different approaches 
and then group them together to obtain a more complex but relatively complete 
hypergraph. The grouping weights can be learned in the training stage. Another 
way is to update the incidence matrix of hypergraph structure, which can be 
either directly optimized as learnable parameters or indirectly optimized via metric 
learning. 


4.5.2 Generative Hypergraph Modeling 


The generative models are a set of models that learn the distribution from the 
Observed data and generate new data instances based on probability. They have 
been widely used in different tasks such as generation, synthesis, translation, 
reconstruction, prediction, etc. In recent years, with the development of deep 
graph representation learning, deep graph generative models have attracted much 
attention. Given a series of training graph data (assumed to be taken from the same 
distribution), the neural network is trained as a graph generation model. Inspired 
by these generative models, building a hypergraph by estimating the distribution 
of latent structures from observed data may be a viable way. Given a set of training 
hypergraphs or sampling signals from every vertex, the distribution can be implicitly 
or explicitly derived by combining hypergraph embeddings and generative models. 

However, there is still a long way to go for hypergraph generative models to 
become practical. Unlike simple graphs whose distributions are the joint distribu- 
tions of all pairwise correlations between data, the distribution of a hypergraph 
structure is the joint distribution of all high-order correlations among data. There- 
fore, the joint distribution is high dimension, and the variables are dependent on each 
other. Estimating the density function is intractable with considerable complexity. 
Furthermore, due to the high-dimensional issue, a large amount of observed data 
is required to make the density estimate closer to the true distribution, which is 
difficult to obtain in practical applications. Despite the above obstacles, generative 
hypergraph modeling is an area worth exploring in the future and will become useful 
in many areas, such as simulations of complex physical systems, trajectory tracking 
system identification, and community detection. 


4.5.3 Knowledge Hypergraph Generation 


Knowledge hypergraph has attracted much attention in recent years since it can 
store facts using high-arity relations. In a knowledge hypergraph #7” = (Y, 8), 
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the vertices represent the set of entities, and hyperedges demonstrate the high-arity 
relations. The basic unit is a fact based on a high-arity relation. Unlike knowledge 
graph that only uses binary relations, the relations in knowledge hypergraph are 
defined on any number of entities. 

Although there have been several pieces of work targeting at knowledge hyper- 
graph embedding and completion, such as Multi-fold TransH (m-TransH) [21], 
Hyper-relational Knowledge Graph Embedding (HINGE) [22], N-ary Link Pre- 
diction (NaLP) [23], they are mostly based on the assumption that there exists an 
initial knowledge hypergraph or some hyper-relational links. Few efforts have been 
made on the initial knowledge hypergraph generation. Actually, manually mining 
the hyper-relations among entities requires much time and effort. Therefore, it is 
of great significance to study the knowledge hypergraph generation methods for 
efficient and comprehensive knowledge inference. 


4.6 Summary 


In this section, we introduce the hypergraph modeling methods, which are cate- 
gorized as the implicit type and the explicit type. The implicit hyperedges can 
be used in tasks in which we can represent each subject and develop metrics to 
evaluate sample similarity. By using the sparse representation, representation-based 
approaches might mitigate the impact of the noise vertices in comparison with 
distance-based ones. Explicit hyperedges are more appropriate when input data may 
already have certain structural details. In general, choosing a suitable hyperedge 
generation method is important for a specific task. Finally, adaptive and generative 
hypergraph modeling are worth further exploring to adjust hypergraph structures 
based on the data and the on-going tasks. 
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Chapter 5 A 
Typical Hypergraph Computation Tasks TCA 


Abstract After hypergraph structure generation for the data, the next step is how 
to conduct data analysis on the hypergraph. In this chapter, we introduce four 
typical hypergraph computation tasks, including label propagation, data clustering, 
imbalance learning, and link prediction. The first typical task is label propagation, 
which is to predict the labels for the vertices, i.e., assigning a label to each 
unlabeled vertex in the hypergraph, based on the labeled information. In general 
cases, label propagation is to propagate the label information from labeled vertices 
to unlabeled vertices through structural information of the hyperedges. In this part, 
we discuss the hypergraph cut on hypergraphs and random walk interpretation of 
label propagation on hypergraphs. The second typical task is data clustering, which 
is formulated as dividing the vertices into several parts in a hypergraph. In this part, 
we introduce a hypergraph Laplacian smoothing filter and an embedded model for 
hypergraph clustering tasks. The third typical task is cost-sensitive learning, which 
targets on learning with different mis-classification costs. The fourth typical task 
is link prediction, which aims to discover missing relations or predict new coming 
hyperedges based on the observed hypergraph. 


5.1 Introduction 


In previous chapters, we have introduced how to generate the hypergraph struc- 
ture given observed data. After the hypergraph generation step, how to use this 
hypergraph for different applications becomes the key task. Hypergraph has the 
potential to be used in different areas, such as social medial analysis, medical and 
biological applications, and computer vision. We notice most of the applications 
can be categorized into several typical tasks and follow similar application patterns. 
In this chapter, we introduce several typical hypergraph computational tasks, which 
can be used for different applications. 
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More specifically, four typical tasks, including label propagation, data clustering, 
cost-sensitive learning, and link prediction, are introduced in this chapter. The first 
typical task is label propagation, which is also one of the most widely used methods 
in machine learning. The objective of label propagation is to assign a label to each 
unlabeled data. In general cases, label propagation on hypergraph is to propagate 
the label information from labeled vertices to unlabeled vertices through structural 
information of the hyperedges. Random walk is a basic processing for information 
propagation, which also plays a fundamental role in this process. We then review 
the hypergraph cut on hypergraphs and random-walk-based label propagation on 
hypergraphs. We introduce the label propagation process on single hypergraph and 
multi-hypergraphs [1, 2], respectively, in this part. 

The second typical task is data clustering, targeting on grouping data into 
different clusters. We introduce how to conduct data clustering using hypergraph 
computation. The hypergraph structure can be used as guidance to the clustering 
criteria. Two types of hypergraph clustering methods are introduced, including 
structural hypergraph clustering and attribute hypergraph clustering, due to the 
different data information in the hypergraph. In structural hypergraph, the clustering 
tasks only use structural information, while in attribute hypergraph, each vertex is 
usually accompanied by attribute information from the real world. We introduce 
a hypergraph Laplacian smoothing filter and an embedded model specifically 
for hypergraph clustering tasks that named adaptive hypergraph auto-encoder 
(AHGAE) [3]. 

The third typical task is cost-sensitive learning, which is to solve the learning task 
under the scenario with different mis-classification costs, such as confronting the 
imbalanced data distribution issue. Here, we introduce two hypergraph computation 
methods, i.e., cost-sensitive hypergraph computation [4] and cost interval optimiza- 
tion for hypergraph computation [5]. First, we introduce a cost-sensitive hypergraph 
modeling method, in which the cost for different objectives is fixed in advanced. 
As the exact cost value may be not easy to be determined, we then introduce a cost 
interval optimization method, which can utilize the cost chosen inside the interval 
while generating data with high-order relations. 

The fourth typical task is link prediction, which is to predict data relationship 
and can be used for recommender system and other applications. Here, the 
hypergraph link prediction is to mine the missing hyperedges or predict new coming 
hyperedges based on the observed hypergraph. We introduce a variational auto- 
encoder for heterogeneous hypergraph link prediction [6]. It aims to learn the 
low-dimensional heterogeneous hypergraph embedding based on the Bayesian deep 
generative strategy. The heterogeneous encoder generates the vertex embedding 
and hyperedge embedding, and the hypergraph embedding is the combination of 
them. The hypergraph decoder reconstructs the incidence matrix based on the vertex 
embedding and the hyperedge embedding, and the heterogeneous hypergraph is 
generated based on the reconstructed incidence matrix. 

Part of the work introduced in this chapter has been published in [1—6]. 
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5.20 Label Propagation on Hypergraph 


This section mainly introduces the label propagation task on hypergraph. We first 
introduce the basic assumptions of the label propagation process. Given a set of 
vertices on a hypergraph, a part of vertices is labeled, while other vertices are 
unlabeled. The task is to predict the label information of these unlabeled data given 
the label information and the hypergraph structure. Figure 5.1 shows that the label 
propagation process is to propagate the label information from these labeled vertices 
to the unlabeled vertices. 

When propagating label information, vertices within the same hyperedge are 
more likely to have the same label because they characterize themselves with similar 
attributes in some aspects, and therefore, they have a higher probability of sharing 
the same label. Under this assumption, the label propagation task can be transformed 
into a hypergraph cut. In a hypergraph cut, the goal is to make the cut edges as sparse 
as possible, with each vertex set after the cut as dense as possible. After cutting the 
hypergraph, different sets of vertices have different labels. This approach satisfies 
the goal based on the above assumption. The form of the hypergraph cut can be 
described below. 

Suppose a vertex set $ € VY and its compliment S. There is a cut that splits the 7 
into S and S. A hyperedge e is cut if it is incident with the vertices in both S and S. 
Define the hyperedge boundary 0S as the cut hyperedges, i.e., 0S = (e € &|en S z 
g,ens Æ Ø}, and the volume of S, vol(S), be the sum of the degrees of vertices 
in S, i.e., vol (S) = 2 ues D, (v). It can be shown as 


vol(9$) = X` wo ats (5.1) 
ecas dd 


The derivation is shown as follows, and the details can be found in [7]. Suppose 


that hyperedge e is a clique, i.e., a fully connected graph. To avoid confusion, the 


edges in the clique are called subedges. Then, the weight 2 is assigned to each 


subedge. When the hyperedge e is cut, |eM S| x |e S| subedges are cut. The volume 
of the cut is the sum of the weights over these subedges. Recall that our goal is to 
make the cut edges as sparse as possible, with each vertex set after the cut as dense 
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Fig. 5.2 An illustration of S = (v,,v5,v3) 
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as possible. Based on the goal, the objective partition formula is written as 


: 1 1 
arg = c(S) = vol(8S) (a5 + =y) : (5.2) 


There are many methods to propagate label information on a hypergraph, and 
the propagation based on random walks is the most widely used. The following 
describes the label propagation by random walk, and the illustration is shown as 
Fig. 5.2. Suppose that the current position is u € 7^, and at first, we walk to a 
hyperedge e over all hyperedges incident with u with probability w(e), and then 
we sample a vertex v € e uniformly. By generalizing from typical random walks 
on graphs, we use P as the transition probability matrix of the random walk on a 
hypergraph, and the element p(u, v) is defined as follows: 


» "T sie €) H(v, e) 


P= GO. Dag 


(5.3) 
eec 


The formula can be organized into a matrix form as P = D; ' HWD; !H'. The 
stationary distribution z of the random walk is defined as 


(5.4) 


where D, (v) is denoted by d (v) for short and vol(.) is the volume of the vertices in 
set S, defined as vol(S) = PM s d(v). The formula can be derived from 


d(u) H(u, e) H(v, e) 
n(u)p(u, v) = CTS w(e) 
2 2 vol (V) 2 Ds (u) De(e) 


EM H(u, e) 
o 2092. RU 2m. à (5.5) 


d(v) 
vol (I) 


1 
S H , = 
CE E (v, e) 
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The objective function Eq. (5.2) can be written as 


gea ( — ) (5.6) 
~ vol(Y) \vol(S)/vol(V) ^ vol(S)/vol(V) J ' 


and then we arrive at 


vol(S) ET d(v) u 
vol(Y) » vol(Y) 2 pg 


where D is the probability of random walks to vertex in S. It can then be shown 


as 


vol (ð S) Hy w(e) len S|len S| 
vol(f) vol (7) (e) 


ecos 
=y y Y w(e) H(u, e)H(v, e) 

ecaS ueenS yeens vol(¥) d(e) 
E d(u) H(u, e) H(v, e) 
72. 2s 2. Oui; du) Sle) (5.8) 
E d(u) H(u, e) H(v, e) 
= à. às ua 2 qu) 86) 


u€enS yeens ecos 


=) na0p,v), 


ueS ves 


where the ratio eA is the probability with the random walk from a vertex in S 


to S under the stationary distribution. It can be seen that the hypergraph normalized 
cut criterion is to search a cut such that the probability with which the random walk 
crosses different clusters is as small as possible, while the probability with which 
the random walk stays in the same cluster is as large as possible. 

Let us review the objective function Eq. (5.2). Note that it is NP complete, while 
it can be relaxed into the following optimization problem as 


l ae we) ( fu) fo) y 
ee dc Ds ae Go ot 


ecg {u,v}ee (5.9) 
s.t. > f(v) =1, 5 f(v),/d(v) = 0, 
vev vev 


where f is the to-be-learned score vector. Since the goal is label propagation, it 
can be arrived at for some labeled data. The optimization problem becomes the 
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transductive inference problem as 


arg min {2 (Ê) + ARemp(f)}, (5.10) 
feR!”| 


where the regularizer term is £2 (f), the empirical loss term is Remp (f) = || f — y |? = 

Y ey fv) — y(v)»?, ye R'”! is the label vector, and À is the balance parameter. 
Let us assume that the i-th vertex is labeled, and the elements of y are all 0 except 
the i-th value that is 1. The regularizer 2 (f) can be turned into 


d wO f fw) — fv) V 
WS EO (Zo Au) 


ecg (u,v)ee 


ES Y w(e)H(u, e)H(v, e) e f(u)f(v) ) 


ecg (u,v)e à(e) d(u) V d(u)d (v) 
(e)H(u, e) H(v, e) (5.11) 
= fo); 
»» 2 d(u) 2 ó(e) 


E y x f(u)H(u, e)w(e)H (v, e)f(v) 
A/d (u)d (v)ó(e) 


ec& u,ve Y 


=f! A- O)f, 


al = 
where © = D, 7HWD-;'H'D, ?. The hypergraph Laplacian is denoted by A = 
I — ©. Therefore, the objective function can be rewritten as 


Qf) = f! Af. (5.12) 
The optimization function can be turned into 


arg min (f! Af + Alf — yl]. (5.13) 
feRI”| 


There are two ways to solve the above problem. The first one is differentiating 
the objective function in Eq. (5.13) with respect to f, and it can be obtained as 


-1 
t= (1e 24) y. (5.14) 


The second one is an iterative method. Similar to the iterative approach in [8], 
Eq. (5.13) can be efficiently solved by an iterative process. The process is illustrated 
in Fig. 5.3. The f'*! can be obtained from the last iterative f' and y, and the 
procedure is repeated until convergence. 
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Step 1: Initialize f/'^, w hen t=0. 
Step 2: Update f by f LL (I— A) f) + 4) 


14+A EZ! 
Step 3: Let t=t+1, and iterate back to Step 2 until convergence. 


Fig. 5.3 The iterative solution of Eq. (5.13). This figure is from [1] 


This process will converge to the solution Eq. (5.14). E prove it, we first pee 
that the eigenvalues of are in [-1, 1]. Since © = D, "HWD; !gTp; ^. 2 
find that its eigenvalues are in [—1, 1]. Therefore, (I+ ©) are positive semi- danie 

The convergence of the iterative process is proved in [1]. Without loss of 
generality, we assume f = y. From the iterative process, it can be obtained that 


" a o4 1 i 
D= o —o 
en) ons )r (Go E 


t—1 


=(1-¢) >} Goy + coy, 


i=0 


(5.15) 


where ¢ = — Since 0 < ¢ < 1, and the eigenvalues of © are in [—1, 1], it can be 
derived that 


Jim. (r9)! 20 (5.16) 
and 
t—1 
; Qui — _ —1 
m (0) --:0) . (5.17) 
Then, it turns out 
1 —1 
f= lim f” = a-oa-:eyty - (re a) y. (5.18) 


Therefore, the convergence of f is proved to be equal to the closed-form solution 
Eq. (5.14). 

The random-walk-based method is the most commonly used approach in label 
propagation on hypergraphs. It has the advantages of being simple to implement 
and theoretically verifiable. 

In many cases, different hypergraphs may be generated based on different 
criteria. Under such circumstances, we need to conduct label propagation on 
multi-hypergraph. Here, we briefly introduce the cross diffusion method on multi- 
hypergraph [2]. We assume that there are T hypergraphs, and the t-th hypergraph is 
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denoted as 4! = (¥', &', W'), where “ is the vertex set, £" is the hyperedge set, 
and W' is a diagonal matrix, representing the weights of hyperedges. 

The transition matrix is first generated for each hypergraph. The label propaga- 
tion process on hypergraph is based on the assumption that the local similarities 
could approximate the long-range similarities, and therefore, the local similarities 
are more important than far-away vertices. The similarity matrix among vertices of 
the t-th hypergraph is shown as follows: 


died: 5 W' (e)H' (u, e)H' (v, e) 


2. Se) ; (5.19) 

or in the matrix form: 
A 2 HWD H. (5.20) 

The transition matrix P’ is the normalized similarity matrix: 
Ai, j 
P'(i, j) = RW (5.21) 
and 

P —D' A, (5.22) 


where D’ is a diagonal matrix with the i-th diagonal element D'(i,i) = 
X S ates» 

The element of the transition matrix P'(i, j) represents the probability of 
transition from the vertex i to the vertex j, and P' could be regarded as the Parzen 
window estimators on hypergraph structure. After the generation of the transition 
matrix, the cross label propagation process is applied to the multi-hypergraph 
structure. 

Denote Yo as the initial label matrix. For labeled vertices, the i-th row of Yo is 
the one-hot label of the i-th vertex, while for the unlabeled vertices, all elements 
of the i-th row are 0.5, indicating that there is no prior knowledge of the label. We 
denote the labeled part of the initial label matrix as XS. 

For simplicity, we assume the number of hypergraphs T is 2. The label 
propagation process for multi-hypergraph uses the output of one hypergraph as the 
input of the other hypergraph, which repeats until the output converges. The process 
could be formulated as 


Yl, — P!Y?, (5.23) 


yere vi (5.24) 
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Fig. 5.4 An illustration of the diffusion process on multi-hypergraph. This figure is from [2] 


and 


Y2,, — P? Yl, (5.25) 


Y^, — Yl, (5.26) 


where Y^ denotes the label matrix of the k-th hypergraph after d times of label 
propagation. This process is shown in Fig. 5.4. 

The overall matrix could be calculated according to the label matrix of each 
hypergraph after convergence: 


T 
1 . 
Y final = T > Yi. (5.27) 
= 


For more complicated scenarios, where more than two hypergraphs are available, 
the label propagation process can repeat that, and the output of one hypergraph can 
be used as the input of other hypergraphs. 

This diffusion process can also be used for a single hypergraph, and the 
framework can be described in Fig. 5.5. 


Initial 
Label Matrix 


Diffusion #1 Diffusion #2 


Fig. 5.5 An illustration of the diffusion process on a single hypergraph 
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5.3 Data Clustering on Hypergraph 


Data clustering is a typical machine learning task that aims to group data into 
clusters. In this section, we introduce hypergraph-based data clustering methods, 
which can utilize the hypergraph structure for better finding correlations behind 
the data. For hypergraph clustering, two types of information can be used, including 
structural hypergraph clustering and attribute hypergraph clustering according to the 
data information in the hypergraph. In structural hypergraph, the clustering tasks 
only use structural information. For example, the hypergraph spectral clustering 
method[7] is extended on the basis of graph, which uses the hypergraph Laplacian to 
learn complex relations between nodes in the hypergraph. And some auto-encoder- 
based techniques[9] are also applied to structural clustering. In attribute hypergraph, 
each vertex is usually accompanied by attribute information from the real world. 
There are two assumptions as follows: 


e Vertices in the same hyperedge have similar attributes. 
e Vertices with similar features have similar attributes. 


How to balance graph structure information and node feature information is a study 
focus of attributed graph clustering [10]. In this way, hypergraphs can utilize the 
features, attributes, and structured information of vertices to conduct data clustering 
task. 

In this section, we introduce a hypergraph Laplacian smoothing filter and 
an embedded model called adaptive hypergraph auto-encoder (AHGAE) that is 
designed specifically for hypergraph clustering tasks [3]. First, we describe the 
hypergraph Laplacian smoothing filter and derive its low-pass filtering properties 
in the frequency domain. Then, we analyze the influence of each vertex on the 
attributes of its connected hyperedges and the feature of neighbor vertices. Finally, 
we introduce the detailed procedure and framework of the adaptive hypergraph auto- 
encoder. 

The hypergraph Laplacian smoothing filter, as shown in Fig. 5.6, first merges the 
vertex features into hyperedge features, and the feature of hyperedge e; is defined 
as 


1 h(j, k) 
E? = x — x 
k IN (ex)| ro = 2 de(ky) 4” (5.28) 


vjEN(ex) vje* 


where e; denotes the k-th hyperedge in the hyperedge set &, v; denotes the i-th 
vertex in the vertex set 7, t represents the order, N (ex) is the vertex set in hyperedge 
ex, Ex describes the hyperedge e; feature, and X ; describes the feature of the vertex 
Uj : 
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Fig. 5.6 An illustration for hypergraph Laplacian smoothing filter. This figure is from [3] 


After aggregating the vertex features to get the hyperedge features, we can further 
combine the vertex features according to the hyperedge weights: 


h(i, k)w(k 
x(*» 2 q— yyxO sy y G, Hw) pO 


. k 
jenny M 
h(i, K)w(k)hCj, k) (0 5.29) 
—u yx x, (. 
vjEV eyeó 


where N (v) represents the hyperedge connected to vertex v, and y € [0, 1] is the 
weight coefficient of the filter. D, denotes the diagonal matrix of the vertex degrees, 
D, denotes the diagonal matrix of the hyperedge degrees, and H is the incidence 
matrix of the hypergraph. In order to make the spectral radius less than 1, we can 
replace DY !'HWD;!H" with symmetric normalized form: 


x") 5 (1 — y)X? + yD;'?HWD; !H' D; 2x? 

(5.30) 
=X — y (1- D; HWD; 'HT D7") xo. 

Then, the multi-order hypergraph Laplacian smoothing filter can be written as 


x? = (1— yLy'X. (5.31) 


After decomposing the eigenvalues of the hypergraph Laplacian operator L = 
UAU7!, the diagonal elements of the diagonal matrix A are eigenvalues of L. The 
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Fig. 5.7 The framework of the adaptive hypergraph auto-encoder framework. This figure is from 
[3] 


frequency response function is as 
p(A) = diag (p (A1),.-.. p 4). (5.32) 
pO) 21—yA.y €[0, 1]. (5.33) 


Due to the eigenvalue of the hypergraph Laplacian A € [0, 1], p( A) is a positive 
semi-definite matrix, and the value of p(A) decreases as A increases. Therefore, 
the hypergraph Laplacian smoothed filtered can effectively suppress high-frequency 
signals: 


F—Up(AJU! -UI- yA)U ^! - I- yL. (5.34) 


Figure 5.7 illustrates how to use the relational reconstruction auto-encoder after 
getting the smoothed feature matrix to conduct vertex representation learning in 
low-dimensional environments without losing information. First, the incidence 
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matrix is used to generate the adjacency matrix: 


A (nm) (5.35) 
e(x) = ie : i 5 (5.36) 


A single fully connected layer is used to compress the filtered feature matrix: 


Z = scale (Xsm@) , (5.37) 


X — min(x) 


scale(x) — (5.38) 


max(x) — min(x) ' 


where Z represents the vertex embedding matrix, which includes both structural and 
feature information, and O is the learnable parameter that is used to extract features 
from the vertices. In order to rescale the range of vertex characteristics to [O, 1], 
scale (-) represents a normalization function. So the following is the similarity 
matrix for vertex features: 


S — sigmoid (zz) (5.39) 


sigmoid(x) — (5.40) 


loe 
This is the inner product decoder used to reconstruct vertex and its neighbors. The 
objective is to minimize the error between the adjacency matrix A and the similarity 
matrix S. However, using Eq. (5.35) to construct an adjacency matrix leads to a 
problem: the number of edges is too large when the hyperedge degree increases. To 
solve this problem, the elements in matrix A are weighted as 


yY- } Aaj „Aj =l 


Wij = YA (5.41) 
1 


The reconstruction loss can be calculated by using the weighted binary cross- 
entropy function: 


1 | vy] 


Lre = we Y -Wy [Ay log $i; + (1 — Ais) log (1 — $;)]. (5.42) 
i=l j=1 


The relational reconstruction auto-encoder can be trained to produce the learned 
vertex embeddings, and the spectral clustering technique can be further used to 
obtain the final clustering results. 
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5.4 Cost-Sensitive Learning on Hypergraph 


Most of the machine learning applications may suffer from cost-sensitive scenarios. 
It is noted that different types of faults in real-world jobs might result in losses 
with varying severity. In diagnostic work, for example, misdiagnosing a patient as a 
healthy person is significantly more erroneous than classifying a healthy individual 
as a patient, as shown in Fig.5.8. Similar cases also happen in the application of 
software defect prediction. Misjudging the flaws of software modules as a good 
one may destroy the software system and have disastrous repercussions in software 
defect prediction. In these cases, cost-sensitive learning methods [11-13] have been 
developed to deal with these issues. 

In many cases, the data from a group of categories may be enough, while the data 
from other categories may be very limited. These imbalanced data distributions lead 
to different costs for the classification performance of different categories. Under 
such circumstances, imbalanced learning [13, 14] attracts much attention, which 
aims to attain a predictive prediction using imbalanced sampling data. In traditional 
methods, sampling methods [15, 16] are used to over-sample the minority class and 
under-sample the majority class to solve the imbalanced sample problem. Another 
way is to conduct cost-sensitive learning that can focus more on the minority class. 

To confront the cost-sensitive issue in hypergraph computation, in this section, 
we introduce cost-sensitive hypergraph computation framework [4] and cost interval 
optimization for hypergraph computation [5], respectively. First, we describe how 
to quantify cost in the hypergraph modeling procedure [4], in which a fixed cost 
value is provided for modeling, and thereafter, we illustrate how to use the cost- 
sensitive hypergraph computation approach to tackle imbalanced problems. As the 
cost value for mis-classification results may not be feasible in practice, we then 
introduce the hypergraph computation method with cost interval optimization [5], 
which can utilize the cost chosen inside the interval while generating data with 
high-order relations. Figure 5.9 shows the frameworks of hypergraph computation 
under cost-sensitive scenarios, from traditional hypergraph modeling, hypergraph 
modeling with cost matrix, to hypergraph modeling with cost matrix using cost 
interval. 
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Fig. 5.8 A medical example of cost-sensitive classification scenario 
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Fig. 5.9 The frameworks of hypergraph computation under cost-sensitive scenarios 


(1) Cost-Sensitive Hypergraph Computation 


In this part, we introduce a cost-sensitive hypergraph computation method [4], 
and Fig. 5.10 shows the framework of this method. This framework consists of 
two stages to handle the cost-sensitive issue: F-measure is used in the initial step 
to calculate candidate cost information for cost-sensitive learning, and then the 
hypergraph structure is utilized to model the high-order correlations among the data 
in the second stage. 

First, we introduce the hypergraph modeling with cost matrix. In traditional 
hypergraph modeling, each vertex represents a subject, and the hyperedges con- 
nect related vertices. To introduce cost information in hypergraph modeling, a 
cost matrix is associated with each vertex, indicating different costs for mis- 
classification, as shown in Fig. 5.11 for a binary classification task. The definition 
of cost matrix is as follows. 

As shown in Fig. 5.11, the cost matrix is a 2 x 2 matrix, including the true positive 
cost Crp, the true negative cost Cry, the false positive cost Crp, and the false 
negative cost Cry, respectively. The true positive cost and the true negative cost are 
mostly 0 in the matrix since that denotes the correct prediction. The cost-sensitive 
hypergraph's propensity for each class is achieved by giving various values to the 
false positive cost and the false negative cost in the cost matrix. A special case is 
that, if the false positive cost and the false negative cost are equal, then the cost- 
sensitive hypergraph reduces to traditional hypergraph modeling. 

We generate candidate cost information at first and then apply F-measure to 
reduce the expense for both binary and multi-class data. For a classifier h, we can 
define the error profile as 


V (h) = (FN; (A), FP (h),..., FNy, (h), FPy, (A)) ; (5.43) 
where Ne represents the number of classes, and FN and FP represent the false 


negative and the false positive probabilities. For simplicity, we let y2,_1 represent 
the FN possibility of the k-th class and y», represent the FP possibility of the k-th 
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class. The F-measure for binary classification can be defined as 


(1 + 8?) (P1 — y) 


Fg(w)-— , 
pen) (1 + 82) Pi — yi (h) + rah) 


(5.44) 


where Py represents the marginal probability of class k. Similarly, the micro-F- 
measure for multi-class classification can be defined as 


(1+ 6?) (1— Pi- Xa Vai) 
(1+ £) à - P) - XX Vai + Va 


mcFg(V) = (5.45) 


We can further divide the F-measure values in the region [0, 1] into a collection of 
equally spaced values F = ( fj) to calculate the cost of various mis-classifications. 
The cost function Y is then used to construct the cost vector using every f;. For 
binary classification, we constrain the denominator of Eq. (5.44) to be positive and 
Fg(V) < fi for a value c of the F-measure: 


(e£ - f) fioe (1 P) no 7020 (5.46) 


Therefore, the cost of yı and y» can be allocated according to f and 1+ p? —f. 
and the cost function can be written as follows: 


1 4- 8? — f, if sample from class 1 
Í, if sample from class 2 . (5.47) 
0, otherwise 


Similarly, the cost function of multi-class classification can be written as follows: 


"m 1 + 8? — f, if sample from odd class and not from class 1 
r” Pale if sample from class 1 
0, otherwise 
(5.48) 


The cost of F-measure optimization is added to the optimization function to 
increase the efficacy of the hypergraph computation method in imbalanced data. We 
first regard each data to be a vertex of the hypergraph and then apply the k nearest 
neighbor algorithm to construct the hypergraph. The cost-sensitive hypergraph 
differs in that it includes the cost matrix information of each vertex in addition 
to the original hypergraph correlation structure. With training and testing samples 
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represented by O, cost-sensitive hypergraph computation function can be expressed 
as 


ME min|ug (w) + Bemp(@) + ro wy}, 


N (5.49) 
st. X Wjj=1,YWjj z0, 
j=l 


where 2 (w) = (Oo)! A(Ow) represents the hypergraph Laplacian regularized with 
hypergraph Laplacian A, Zemp(w) = | T (Oo — y) = 3 (X (ojo — y) is 
the empirical loss using cost information with diagonal matrix Y that Y; į represents 
the cost of the i-th data, # (W) = XIWIIZ stands for the hypergraph regularization, 
@ represents the mapping vector to be learnt, W is a diagonal matrix representing 
hyperedge weights, and u and A are the trade-off hyperparameter. We first fix W to 
optimize c, and then the optimization equation can be expressed as 


arg min | IY (Ow) — yl + (Ow) Ao) (5.50) 
The optimal w can be obtained as 
T 2 T SUNT 
ps (o Y?O + “0 AO) (o ry). (5.51) 
Following that, we fix œw to enhance W: 


arg min{ u00)" A(Ow) + AIWF. 


N (5.52) 
S.t. XY Wij = LVWij > 0. 
j=l 


We can have W as 


w- HATA We)! = nl 


A ; (5.53) 


_ wA@,)7!AT=2a 
ui N 


where 7 can be calculated as , and A can be calculated as 


A = (0o)! (D,)~!/7H. The optimized mapping vector w allows sample ¢; in the 
test set to obtain the classification result y = Zjc. 

Each piece of potential cost information c; generates a cost matrix Y, which is 
then used to build a cost-sensitive hypergraph structure G. The model then employs 
an efficient collection to choose the cost-sensitive hypergraph with the greatest F- 
measure as the best choice. 
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(2) Cost Interval Optimization for Hypergraph Computation 


As the cost value for cost-sensitive hypergraph modeling is not easy to be deter- 
mined in practice, in this part, we introduce a cost interval optimization method 
for hypergraph computation [5], in which the fixed cost value is replaced by a cost 
interval, which is much easier to be provided than a fixed cost value. 

Given a hypergraph = (VY, &, W), the regularization foundation of the cost- 
sensitive hypergraph can be divided into three components, i.e., empirical loss 
using cost information, the hypergraph Laplacian regularizer, and the hypergraph 
regularization, in order to optimize the overall cost by adding the mis-classification 
costs of various categories to the hypergraph framework. 

The empirical loss using cost information can be formulated as 


Ny 


Remp(@) = I (So — Y = Y (Dii Gio - yd) (5.54) 
i-l 


where o represents the mapping vector, and @ is a diagonal matrix representing mis- 
classification cost weights. The hypergraph Laplacian regularizer can be written as 


2 
1 W(e)H (vj, e) H (vj, e) WS; OS j 
(v) = = l 
2 2. 2L (e) Xd(vi) Jq (vj) 
= (So)! A(So). 
(5.55) 


To adjust the hyperedges weights and hence the hypergraph classification ability, 
the hypergraph regularization is written as v (W) = IWI}. It is noted that this part 
can be removed in different applications, if not required. 

Combining the above three, the whole optimization task for cost-sensitive 
hypergraph computation can be written as 


arg min{ || (So — y)||3 + iSo)" AGSo) + 11 Wiz]. 


Ne (5.56) 
st. Wu Sly Wig 2 0, 
j=l 


where u and A are the trade-off hyperparameters. 

The precise cost of each category is required for cost-sensitive hypergraph 
computation, but the cost is frequently impossible to be obtained, and it can only 
be known that the cost is within a cost interval [Cmax, Cmin]. Therefore, a simple 
idea is to attempt all values inside the cost interval and minimize the overall cost. 
However, this is inefficient given the possibly huge cost interval. As the actual cost 
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is difficult to establish, we need to find a surrogate cost c* to guide the optimization 
procedure, and the surrogate classifier h* is supposed to be as successful as the true 
cost classifier h’. In this way, the problem can be formulated as 


min L(h, c*), 
h,c* 


(5.57) 
s.t. p(L(h, c) < 0) > 1 — 9, Vc € [Cmin, Cmax], Cmin € e X Cmax, 


where L(h,c) is the empirical risk. L(h,c) is formulated as L(h,c) = 
Y^. cl (pi X y^ycd)-cl(piZy^y--) where pj = sio is the i-th 
data labeling in the test set, and + and — represent the label of the important class 
and the unimportant class, respectively. 

The worst-case risk is considered first to guarantee that all limitations can be 
fulfilled. The worst-case classifier h* can be written as 


h* — arg min sup L(h, c) (5.58) 
c 
and 
p (spz (hy, c) < e) -1-—9. (5.59) 
Cc 


We have p (L (hx, c) < 0) > 1 — @ for any c. The worst-case risk is attained 
when the surrogate cost c* equals Cmax. However, only a solution that meets the 
requirements can be acquired in this manner, and the cost cannot be guaranteed to 
be close to the true cost. As the average cost is the smallest maximum distortion 
of the genuine risk, it is another good choice, which can be calculated as Crean = 
0.5(Cmax + Cmin). 

With the use of alternative costs Cmax and Cmean, we can conduct cost interval 
optimization. First, Cmax is used as a surrogate cost, and a collection of cost- 
sensitive hypergraph structures with varying parameter values is learned in the first 
stage. Then, Cmean is used as a surrogate cost to determine the lowest overall cost on 
the valid dataset, and then we choose the hypergraph structure as the final solution. 

In this section, we describe cost-sensitive hypergraph computation methods. 
Imbalanced data issue is very common in many applications. The cost-sensitive 
hypergraph computation methods introduce cost matrix in hypergraph modeling, 
and both fixed cost value and cost interval can be used in the learning process. 


5.5 Link Prediction on Hypergraph 


Link prediction is a fundamental task in network analysis. The objective of link 
prediction is to predict whether two vertices in a network may have a link. Link 
prediction has wide applications in different domains, such as social relation 
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exploration [17, 18], protein interaction prediction [19, 20], and recommender 
system [21, 22], which has attracted much attention in the past decades. 

Link prediction on hypergraph aims to discover missing relations or predict 
new coming hyperedges based on the observed hypergraph, where hypergraph 
computation can be used to deeply exploit the underneath high-order correlations 
among these data. Unlike the link prediction task on the graph structure [23, 24], 
the hypergraph models the high-order correlation among the data, which is hetero- 
geneous in many applications, as the vertices are in different types. For example, 
in a bibliographic network, the vertex can represent a paper, an author, or a venue, 
while the hyperedge represents the relation where the paper is written by multiple 
authors and published in a venue. These different types of vertices do not necessarily 
share the same representation space. The heterogeneous hypergraph consists of two 
kinds of vertex in the view of the hypergraph event, i.e., identifier vertex and slave 
vertex. Identifier vertex is the vertex that determines a hyperedge uniquely, while 
slave vertex is the other vertex except for the identifier vertex. In this section, we 
introduce the Heterogeneous Hypergraph Variational Auto-encoder (HeteHG-VAE) 
method [6] for heterogeneous hypergraph link prediction task. 

The overview of HeteHG-VAE can be found in Fig.5.12. HeteHG-VAE aims 
to learn the low-dimensional heterogeneous hypergraph embedding based on the 
Bayesian deep generative strategy. The input hypergraph is represented by the 
incidence matrix H, whose sub-hypergraph represents the hypergraph generated 
by different types of slave vertices. The heterogeneous encoder can project the 
vertices and the hyperedges to the vertex embedding and hyperedge embedding, 
respectively. The hypergraph embedding is the combination of the vertex embedding 
and the hyperedge embedding, which can be used for reconstructing the incidence 
matrix by the hypergraph decoder. 

In the following part of this section, we first introduce the variational evidence 
lower bound with the task specific derivation. Then, the inference model, including 
the heterogeneous vertex encoder and the heterogeneous hyperedge encoder, is 
presented. At last, the generative model and the link prediction method are 
introduced. 

Denote {xx} Ti as the observed data with the total number K, zy as the 
latent vertex embedding, and Z^ as the latent hyperedge embedding. HeteHG-VAE 
assumes that ZY and Z are drawn i.i.d. from a Gaussian prior, i.e., zy ax po(Z} ) 
and ZE ~ po(ZF ), and x, are drawn from the conditional distribution, xy ~ 
P(xKlZy, ZE; àk), where A4 is the parameter of the distribution. The objective of 
HeteHG-VAE is to maximize the log-likelihood of the observed data by optimizing 
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Xx as follows: 
log p(xi, +: , xK; À) 


= tog | -f i P six ZY,- , ZX, ZE; WAZY seed ZZ 
zy Z% JZE 


IV 


4 l Pi xK, ZY, , ZY, ZF; a) 
10) 
10E GEN LY, ZI EET, 


:= Z (x1, XK 6,4), 
(5.60) 


where q(-) is the variational posterior for the estimation of the true posterior 
p(Z*, i35 ZU. ZF |x, ...,XK), Which is inaccessible, and 0 is the parameter to 
be estimated. Then, .Z/(x1,..., XK; 0, X) is the evidence lower bound of the log 
marginal likelihood. Based on the evidence lower bound, an inference encoder is 
presented to parameterize q, and a generative decoder is used to parameterize p. 

The inference encoder of HeteHG-VAE consists of two main parts, i.e., the 
heterogeneous vertex encoder and the heterogeneous hyperedge encoder. Hetero- 
geneous vertex encoder first maps the observed data x; to a latent space ZY , which 
can be written as 


Zi = fY (aW, + bf), (5.61) 


where Wr and br are the to-be-learned weights of the model, and f is a nonlinear 
activation function. Two separated linear layers map the latent representation of the 
means uy and variances of of q: 


uf = ZIWI" x bi", (5.62) 

of = ZY wr? Bb, (5.63) 
where wi" ; by n Wr, and bi ? are learnable parameters. The vertex embedding 
is the sample from the Gaussian distribution ./^ ue aj). 


Heterogeneous hyperedge encoder first maps the observed data xg to a latent 
space Ze , which can be written as 


ŽE = fF WE +58), (5.64) 
where wE and bE are the to-be-learned weights of the model, and f# is a nonlinear 
activation function. Then, the importance of different types of vertices is learned by 


the hyperedge attention mechanism, which can be written as 


à, = Tanh(ZE Wh + bE*)P, (5.65) 
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where wE un ir^. and P are learnable parameters. The attention score o is obtained 
by normalizing &,, and the hyperedge embedding can be written as 


K 
Z =X aZ. (5.66) 
k=1 


Similarly, two separated linear layers map the latent representation of the means u? 
and variances c 7 of the distribution q: 


we = ZEWEH + bP, (5.67) 


oF = ZEW? + pEc. (5.68) 


where WE“, bE#, WE? . and b£? are learnable parameters. The vertex embedding 
is the sample from the Gaussian distribution (uE oË). 

The incidence matrix is sampled from a Bernoulli distribution parameterized by 
Hg: 


p(Hij|Zi;. ZE j Ax) = Ber (ttj), (5.69) 
where %4 j is the dot product of the vertex embedding and the hyperedge embedding: 
Hj; = Sigmoid(Zy (Z5) ). (5.70) 


The likelihood of the connection among vertices could be obtained based on the 
vertex embedding and hyperedge embedding as follows: 


Peon (Zy , Z7) = MZ, Zi 12. (5.71) 


In this section, we have introduced the Heterogeneous Hypergraph Variational 
Auto-encoder method [6] for the task of link prediction on hypergraph, which 
captures the high-order correlations among the data while preserving the origin low- 
order topology. Link prediction on hypergraph has shown superior performance in 
different experiments and can be further used in other applications. 


5.6 Summary 


In this chapter, we introduce four typical hypergraph computation tasks, including 
label propagation, data clustering, imbalance learning, and link prediction. Label 
propagation on hypergraph is to predict the labels for the vertices on a hypergraph, 
i.e., assigning a label to each unlabeled vertex in the hypergraph, based on 
the labeled information. Data clustering on hypergraph divides the vertices in a 
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hypergraph into several groups. Imbalanced learning on hypergraph considers the 
imbalanced data distributions and introduces cost-sensitive hypergraph computation 
methods. Link prediction on hypergraph discovers missing relations or predicts 
new coming hyperedges based on the observed hypergraph. We note that these 
four tasks are typical ways to use hypergraph computation in practice. Other 
tasks can also be deployed under the hypergraph computation framework, such 
as data regression, data completion, and data generation. Following these typical 
hypergraph computation tasks, we can use them in different applications, such as 
social media analysis and computation vision. 
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Chapter 6 A) 
Hypergraph Structure Evolution TCA 


Abstract In practice, noise exists in the process of data collection and hyper- 
graph construction. Therefore, missing, abundant, and noisy connections may be 
introduced into the generated hypergraph structure, which may lead to inaccurate 
inference on hypergraph. Another issue comes from the increasing data stream, 
which is also very common in many applications. It is important to consider the 
structure evolution methods on the hypergraph, which optimize the hypergraph 
structure accordingly. Early hypergraph computation methods mainly rely on static 
hypergraph structure, which may suffer from the limitation of the static mechanism 
when confronting random and increasing data scenarios. In this chapter, we intro- 
duce dynamic hypergraph structure evolution methods, including both hypergraph 
component optimization and hypergraph structure optimization. Finally, we briefly 
introduce the incremental learning method on growing data. 


6.1 Introduction 


The hypergraph structure models the high-order and complex correlations among 
data, and thus the quality of topology structure plays an important role in learning 
tasks on hypergraph. As shown in the previous chapter, there have been implicit 
and explicit methods of hypergraph generation from observed data. However, the 
generated hypergraph may contain abundant, missing, and noisy connections due 
to the disturbances in the process of data collection and hypergraph construction. 
In other words, there may exist biases between the generated hypergraph and the 
ground truth structure. Under such circumstances, it is essential to optimize the 
hypergraph structure to make it fit the ground truth high-order correlation more 
accurately. The quality of a hypergraph can be directly qualified by comparing with 
the ground truth structure if available, or indirectly evaluated by the performance 
of downstream applications. Most existing hypergraph computation methods rely 
on static hypergraph structure, such as k-nn-based method [1], cluster-based 
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method [2], and spare representation-based method [3]. These methods may suffer 
from the inaccurate hypergraph structure that exists in practice. In this chapter, we 
introduce hypergraph structure evolution methods under the dynamic hypergraph 
structure learning mechanism. Hypergraph structure evolution can be divided into 
two main categories, i.e., hypergraph component optimization and hypergraph 
structure optimization. The problem of hypergraph structure evolution is usually 
integrated with the learning process and formulated as a bi-level optimization 
problem. Part of the work introduced in this chapter has been published in [4-7]. 


6.2 Hypergraph Component Optimization 


Besides the main structure of a hypergraph, i.e., the incidence matrix, a hypergraph 
is also composed of a group of components such as the weights for hyperedges, 
vertices, and even sub-hypergraphs, which play an important role on the hyper- 
graph structure. Hypergraph component optimization aims to explore the optimal 
components of the hypergraphs, i.e., hyperedge weights, vertices weights, and 
sub-hypergraph weights. The hyperedge weights represent the strength of each high- 
order correlation among data, while the vertex weights represent the importance 
of different samples on the structure. In many cases, we may construct multiple 
hypergraphs using multi-modal data or different criteria, which can be regarded as 
sub-hypergraphs. The sub-hypergraph weights are used to measure the importance 
of different sub-hypergraphs on the overall structure. The optimization procedure 
adjusts the hyperedge weights, the vertex weights, and sub-hypergraph weights 
during the training process in order to improve the performances on the downstream 
applications. 


6.2.1 Hyperedge Weight Optimization 


The hyperedge is a basic component of the hypergraph, representing the high-order 
complex correlation among data. The initial hypergraph usually assigns an identical 
weight to all hyperedges. However, hyperedges actually have different effects for a 
given task. The hyperedge weights indicate the importance of different hyperedges 
contributing to the whole structure. In this section, we introduce the hyperedge 
weight learning methods [4], in which the weights of hyperedges are adaptively 
adjusted during the training process, and thus the importance of different hyperedges 
can be automatically modulated. 
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We assume that there are m hyperedges in the hypergraph, denoted by 
{e1, €2. ..., €n}. The weights of the hyperedges are defined by the n x 1 vectors 
w = [w1, w2, ..., wn]! . There is usually a constraint on the hypergraph weights 
that their sum is equal to one, i.e., 9 77. @ = 1. We use F to denote the output of 
hypergraph learning. The problem of learning hyperedge weights can be formulated 
in a dual-optimization form mathematically 


arg mund (E) = {Q(F) + ARemp(F) + io (w)] , 
6.1 
s.t. X Wee) =1. 56) 


eec 


Here, £2(F) and Remp(F) are the regularizer and empirical loss of F, respectively. 
(w) is the regularizer on w. X and w are the scalars controlling the relative 
importance of these three items. 

The general formulation can be implemented by specifying the functions £2(-), 
Remp() and ®(-). As said before, F is the to-be-learned labels in the node 
classification task. The regularizer §2(F) can be defined as F AF, where A is the 
Laplacian matrix. The empirical loss Remp (F) in the general form can be instantiated 
by the difference between the learned F and observed labels of training data Y, 
which are called the least residuals. The regularizer on w is a 2-norm. The general 
formulation can be written as 


n 
arg minV (F) := 1F' AF À- AJF — Yl? +u > Z , 
F,w icl 
(6.2) 


n 
s.t. > w? =1. 
i=l 


The aim of the learning process is to search the optimal solution of F and w to 
minimize the cost function in Eq. (6.2). 

There are two variables to be optimized in Eq. (6.2), which can be solved by 
the alternating optimization algorithm. For each instant in time, one variable is 
optimized, while the other is kept constant for the to-be-learned two variables F and 
w. The details of the alternating optimization strategy are introduced as follows. 

Given the initial hyperedge weights, the first step is fixing w and optimizing 
Q (F). The sub-problem is written as 


arg min V (F) = argmin [FT AF + AIF — Y?]. (6.3) 
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A closed-form solution of Eq. (6.3) has already been achieved from the traditional 
hypergraph learning. The solution is written as 


1 —1 
F={/+-A Y 
( RE ) 


1 -1 
= (1+ a) Y 


A+1 1 x: 
= I O Y. (6.4) 
À à+1 
Let ¢ = cr and Eq. (6.4) can be rewritten as 
1 
F= rl- 59) "Y. (6.5) 


With the updated F, the next step is fixing F while optimizing w, and the sub- 
problem about w is 


n 
; E i T 2 
arg min ¥ (F) = arg min [r AF+u 2. w; | ; 
= (6.6) 


n 
s.t. X wi =Il1,u>0. 


i=l 


The Lagrangian multipliers method is employed here, and the sub-problem is 
replaced with 


n n 
pT 2 
arg min F AF t uo Sw; exui 


i-l i-l 
1 1 n n 
ont = Ale 2 
= arg min F (1-p, ‘nwo; H D, reu +o(Som- j. 
I= = 
(6.7) 
—1 
Let I = Dj * H, and it can be shown that 


| F'IF-2u 


n 


n (6.8) 
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and 


1 F'rp;'r'F n FID; il F 
n 2nu 2u i 


Wi = 


(6.9) 


Here, T; defines the i-th column of I. 

In this way, F and w are alternatively updated until convergence. Finally, the 
optimal values of F and w are obtained. We note that the above method is a typical 
way to optimize the hyperedge weights using the /?-norm. Other methods can also 
be used to learn the hyperedge weights using different constraints. 


6.2.2 Vertex Weight Optimization 


Early hypergraph computation methods may not take the importance of vertices 
into account and mainly focus on the weights of hyperedges. However, the vertex 
set in the hypergraph may have heterogeneous, unbalanced, and outlier problems, 
resulting in performance degeneration of learning process. Therefore, it is highly 
required to consider the weights of vertices to define the impact of different subjects 
during the learning process. For example, vertices belonging to the minority class 
may require larger weights and vice versa for imbalanced data. In this part, we 
introduce the vertex-weighted hypergraph learning method [5], which can update 
the vertex weights during the learning process. 

The aim of vertex-weighted hypergraph learning algorithm is to emphasize the 
vertices with distinguishable information and disregard the redundant vertices that 
bring in bias and noise instead of useful information. On the basis of learning 
hyperedge weights, vertex-weighted learning algorithm further considers the vertex 
weights. Here let (vj, vo, ..., Vn} denote all n vertices in the hypergraph. The 
corresponding weight for vertex v; is represented by u;. Let U denote the diagonal 
matrix of vertex weights. The overall cost function is similar to learning hyperedge 
weights, but with the impact of U simultaneously taken into consideration. The 
general formulation is written as 


arg minty (F) := [Qu(F) + ARemp(F) + uP (w)} , 

6.10 

s.t. W(e) x 0, >: H(v, e)W (e) = D, (v). l ! 
eec 


The key point of vertex weight optimization is to design a reasonable vertex 
weighting scheme that scores the importance of each subject during the learning 
process. First, the pairwise distances between vertices are calculated based on the 
features. Let d;; denote the distances between vertices v; and vj, and d; declares the 
mean distance between v; and all other training vertices with the same label. The 
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vertex weight is then defined as 
a 
uj = —L— ——. (6.11) 
Seu d; 
j=l 4j 


where Mntrain denotes the number of training samples. It is noted that only the 
training data are labeled and further weighted. The unlabeled vertices are initialized 
with an identical weight. Normalization is then applied to the vertex weights. This 
weighting scheme can assign higher weights to vertices that are far from other intra- 
class vertices and vice versa. Therefore, the importance of repeated/close samples 
is relatively smaller than the outliers during the hypergraph learning process. 

Since the hypergraph structure is updated with vertex weights, the hypergraph 
structure regularizer is different from the initial one. As stated already, the hyper- 
graph regularizer is defined based on the cut cost. Here, the cut cost is related to 
not only just the hyperedge weights but to the vertex weights. In general, the higher 
the weight of two vertices, the higher the cut cost. Therefore, the regularizer of the 
hypergraph structure Vp (F) is rewritten as 


Q(F) = 


Sy y W(e)U(u)H(u, e)U(v)H(v, e) (2i a 
26(e) Vdtu) | Jd(v) 


k=l ece u,ve Y 
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W(e)UGO)H(u, e)U(v)H(, e) 
D y Meum onm 


k=1 e€€ u,veY 


E k)? F(u, )F(v, 2) 
d(u) Jd(u)d(v) 


C 
4x W(G)H(u, e) xA Hv, e)UQ) 
=> [x U(u)F(u, k) 5 d 25 35 


k=1 uey eec ve 


x 


3 s F(u, k)U(u)H (u, e)W(e)H(v, eU(v)F(v, k) 
Jd(u)d(v)d(e) 


eed u,vEev 

C 
= SOF, k)! AVEC, k) 
k=1 


= F' ApF. (6.12) 


Here, F(:, k) is the k-th column of F and C is the number of data categories. Ay is 
the vertex-weighted hypergraph Laplacian, which can be defined as 


Ay = U- © =U—D,!”UHWD, !H'UD, '”. (6.13) 


6.2 Hypergraph Component Optimization 107 


1 
Compared with the traditional hypergraph Laplacian A = I — D, 7HWD,~'H™ 
D,~?, the hypergraph Laplacian with weighted vertices takes different weights 
of vertices into consideration during the evaluation of the cost on the hypergraph 
structure. Therefore, the learning task can be further defined as 


inW (F) := 4 F! AyF + AF — Y]? We) 1, 
arg minu (F) uF +All I? +u Wee) 


id (6.14) 


s.t. W(e) > 0, Bo. e)W (e) = Dy w). 
eec 


The above optimization problem can be solved by the alternative optimization 
algorithm. The sub-problem about F has the closed-form solution as in traditional 
hypergraph learning. The sub-problem about W is written as 


inV (F) := 1 F' AyF We) 1, 
arg min (F) vF 4. zi 


eed (6.15) 


s.t. W(e) > 0, ». H(v, e)W(e) = D, (v). 


eec 


The above optimization task can be solved via quadratic programming, since it is 
convex on W. Through vertex weight optimization, the vertex-weighted hypergraph 
structure takes the contribution of each vertex to the whole hypergraph structure 
into consideration, and thus it can model the high-order relevance among objects 
more accurately. During the learning process, the impact of low-quality training 
samples on the structure and subsequent classification tasks decreases continuously, 
while high-quality training data, which account for a minority, can be given greater 
importance. On the other hand, the minority of training data can have greater 
importance. The additional vertex weights lead to an optimal Laplacian matrix 
of hypergraph that measures data correlation better than the traditional one and 
consequently lead to improvement of the classification performance. 


6.2.3 Sub-hypergraph Weight Optimization 


Given multiple sub-hypergraphs that are used to jointly formulate the correlation 
among data, it is important to measure how these sub-hypergraphs work in the 
main task. Sub-hypergraph weight optimization adjusts the importance of the sub- 
hypergraphs, which models the complex correlation among the multi-model data. 
In this part, we introduce the inductive multi-hypergraph learning (MHL) [7] to 
learn the weights of the model and adjust the weights of the sub-hypergraphs during 
the training process simultaneously, which models the high-order correlation of 
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Fig. 6.1 The framework of inductive multi-hypergraph learning method. This figure is from [7] 


the multi-model data with the multi-hypergraph, diffuses the sub-hypergraphs as 
the modality weight, and learns the map from the data to the labels under the 
supervised setting. Given testing data, the learning projection can be used to predict 
corresponding labels. The framework of iMHL is illustrated in Fig. 6.1, where the 
offline training and online training are both supported by the inductive learning 
process, which can easily handle new coming data efficiently. 

Here, we denote m as the total number of all sub-hypergraphs and Y = 
(fi, éi, Wi) as the i-th hypergraph for the i-th modality. The projection matrices 
Mj are combined as per the sub-hypergraph weights and are used to map the data 
to the label for prediction. The combination weights e = [@1,--- , @m] are another 
object to be optimized, which represents the weight of the corresponding modality, 
subject to $ 7^ , wi = 1 and e > 0. 

The loss function W for learning all M; can be formulated as 


y = N wi {82 (Mi) + à Remp Mi) + uP (Mi)] + n D (o), (6.16) 
i=1 


which consists of two main parts, i.e., the summation of the cost of each sub- 
hypergraph and the regularization on the sub-hypergraph weights œ. ®(M) is the 
regularizer on the projection matrix. We assume that the vertices with similar labels 
are connected strongly, and §2(M) can then be written as 


a W(e)H(u, e)H(v,e) (XTMU, k) XTMU, ) V 
iiu A êle) ( Vdu) Jd) ) 


= t«M! XAX M), (6.17) 


where A denotes the normalized hypergraph Laplacian, 


A 2 I- D,'?HWD; !H'p;^., (6.18) 
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The empirical loss term Remp(M) can be written as 
Remp(M) = IIX' M — Y|/’. (6.19) 
4 (M) can be formulated as the £2 ;-norm of M, 
M) = ||MIl2,1, (6.20) 


which produces row sparsity for more informative features. I (c) is the €2-norm of 
the sub-hypergraph weights 


rœ) = |loll’, (6.21) 


which aims to learn the optimal weights for each sub-hypergraph. 
The inductive multi-hypergraph learning task can be formulated as 


m 
arg min. 5 ^o (Q(Mj) + ARemp(Mi) + uP M;)) + nF), 
TOYS (622) 


It is observed that Eq. (6.22) could be split into m + 1 independent sub-problems, 
each M; is optimized individually, and the combination weights w are optimized to 
fuse all multi-hypergraphs. 

The optimization of M; shown below can be solved by iterative algorithm. 


arg min £2(M;) t ARemp (Mi) + n (Mi). (6.23) 


The optimization problem of w can then be written as 


m 


arg min Y wi (2 (Mj) + ARemp(Mi) + #®(Mi)) + nllell?, 
020 i=l 
(6.24) 


m 
S.t. y» =i; 
i=l 
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We denote Y; = (Mi) + ARemp(Mi) + u® (Mi), and Eq. (6.24) can be 
simplified to 


m 
; "m 2 
mde + nllell?, 
i= 
(6.25) 


m 
S.t. Xoi e. 
i-l 


The Lagrangian algorithm can be applied to solve Eq. (6.25), which can be 
formulated as 


m m 
venie om enti ec (Yoo 1). (6.26) 
e, 
i=l 


i=l 


Then, we can have 


-y". 5-2 
p= Ae T= 2m (6.27) 
m 
and 
1 an N 
@i=— + Nn A: (6.28) 
m 2m 2n 
Given the testing sample x! — Lens x1) features for each modality, the 
prediction of the corresponding label can be achieved by 
m 
T 
C(x‘) = arg max X oix} Mi. (6.29) 


i-l 


The overall algorithm is shown in Fig. 6.2. The optimization of sub-hypergraph 
weights is effective as the incorporation of the multi-modal data via multiple sub- 
hypergraphs can make it flexible to investigate the contributions of different data or 
information on the learning process. 


6.3 Hypergraph Structure Optimization 


Although the above component optimization methods can modify the weights of 
hyperedges, vertices, or sub-hypergraphs, it is not easy to precisely adjust the 
inappropriate or wrong connections since the intersections between vertices and 
hyperedges cannot be changed, i.e., the incidence matrix of the hypergraph is 
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Input: The training data X = [x], -+ Xn] € R"*6, 
the label matrix Y = [y1,--- .yn] € R"** for X, 
the testing sample x. 

Output: The predicted class for x’. 

Multi-hypergraph construction. 

Update M; for each hypergraph. 

Initialize U as an identity matrix. 

while not converges do 

Update M; and U; alternately. 

Update @ by Eq. (6.28). 

Predict the class of x! by Eq. (6.29). 

return result 


E» 


DIA as 


Fig. 6.2 The workflow for the sub-hypergraph weight optimization method 


fixed. To solve this challenge and further optimize the hypergraph structure, it 
is imperative to investigate how to finely optimize the hypergraph structure and 
dynamically learn the high-order relationship. It can be regarded as finding the 
optimal hypergraph structure in a hypergraph space, as shown in Fig. 6.3. 

In this part, we introduce the dynamic hypergraph structure learning method [6], 
and Fig. 6.4 shows the framework of this method. Different from the above methods, 
structure optimization on incidence matrix aims to optimize the incidence matrix H. 


Fig. 6.3 An illustration of hypergraph structure evolution 
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The output F and the incidence matrix H are jointly optimized by the dual- 
optimization method. The objective function of the joint learning can be formulated 
as 


arg min VOD := [Q(F, H) + àZemp (F) + 10 (H)].. (6.30) 


There are three terms in the objective function, explained as follows: 


* First, 2 (F, H) is the regularizer related to F and H. The output F is the to- 
be-learned label vectors of vertices. Therefore, smoothness is expected to be 
conducted on the hypergraph structure, where the commonly used regularizer 
of hypergraph smoothness can be written as 


Q(F,H) — tr (F' (1 - D; "^HWb;'H'p;'7) F) (6.31) 


However, the regularizer in the previous methods is a function only of F, while 
H is a stable parameter. Here, the regularizer is a function of both F and H. 

* Second, the empirical loss femp (F) is the /?-norm between F and Y. 

* Third, (H) is the regularizer only related to H to additionally constrain H to 
satisfy the prior knowledge. For instance, given the feature information of data, 
the hypergraph structure is expected to preserve smoothness not just in the label 
space but in the feature space as well. Let X denote the features of vertices, and 
the regularizer can be formulated as 


PEF) — t (xT (1 = D; "^HWb;'H'p;'7) x) (6.32) 


To summarize, the general objective function in Eq. (6.30) for dynamic hyper- 
graph structure learning is instantiated as 


arg min W(F) :=tr ((1 = D; '^HWD;'H'D;'?) (FFT + uXX")) 
F,0<H<1 
AIF — YI. (6.33) 
Similar to the previous methods, the alternative optimization algorithm is applied 
to solve the dual-optimization problem. The sub-problem about F has the same 
closed-form solution as traditional hypergraph learning [8]. 
The most important point that is different from the previous one is the sub- 


problem about H, which is written as 


arg min 2H) = 2 (H) + uS H) 


=tr (à = D; '^HWD;'H'D;'?) K) (6.34) 
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where K = FF! 4- XX. The projected gradient method is employed here because 
Eq. (6.34) is a complex function of H with a bound constraint. The gradient is 
derived as 


V2) =J (18 H'D; "Kp; '?H) WD,” 


+ D,” HWD}; 'H' D; "KJW — 2D; "KD," "HWD;', 
(6.35) 


where J = 11! . The detailed derivation process can be found in [6]. The step size of 
learning H is set as o. Since H is required to be in the range of [0, 1], the projection 
P on the feasible set is conducted after each update. Therefore, H is updated by 


H+ = P [H; — o V 2 (Hj)], (6.36) 
where 


hij if0 < hij <1 
P[^;]-|0 ifhjy<0 . (6.37) 
1 if hij >1 


In this way, we can alternately optimize F and H until the objective function 
converges. 

The dynamic hypergraph structure learning method can outperform the tradi- 
tional hypergraph learning consistently. This is due to the fact that the dynamic 
hypergraph structure can fit the data better and formulate the high-order correlation 
more effectively. Furthermore, both the feature and the label information are 
applied for the hypergraph structure optimization. Therefore, the learned hypergraph 
structure is smooth on the feature space and the label space. In other words, 
the vertices with the same labels have stronger high-order connections, which 
benefit the downstream task. We also note that the above dynamic hypergraph 
structure optimization method is with relatively high computational complexity, as 
it optimizes the whole incidence matrix H. 


6.4 Incremental Learning on Growing Data 


Most of the existing methods consider the static structures with fixed sets of vertices 
and edges, while the data are generally dynamic in real-world applications. Under 
such circumstances, the vertices and connections can be added or removed, and 
the vertex attributes and connects weights change during the dynamic procedure. 
Generally, there are two typical ways of dynamic structure learning, i.e., using 
recurrent architectures [9, 10] and capturing temporal patterns [11, 12]. However, 


6.4 Incremental Learning on Growing Data 115 


the efficient learning of temporally growing structure has not been explored yet, 
where the vertex and edge sets are expanding over time. Taking the citation network 
into consideration, new publications and citation links are continuously added into 
the network. 

The incremental subgraph is the subgraph with the newly appeared vertices 
and related new edges in the given growing graph at each time step. The edges 
connecting the vertices from the same incremental subgraph are denoted as intra- 
edges, while the edges connecting the vertices from different incremental subgraphs 
are denoted as inter-edges. The incremental learning method aims to update the 
model based on the incremental subgraphs at each time step and perform on the 
entire graph consistently. The challenge of the incremental graph learning method 
is how to design the efficient strategy to update the model with incremental data and 
maintain the performance on the whole dataset. 

The main differences between incremental graph learning and existing incremen- 
tal learning methods are as follows: 


* Incremental learning on growing graphs should store the observed vertices, 
which may be connected with the newly coming vertices, while existing incre- 
mental learning methods always drop the old samples under some scenarios. 

* Considering the effect of the inter-edges on training, it is also essential to use 
previous data when updating models with newly appeared data. 


There are two straightforward solutions of incremental graph learning. First, the 
static graph learning methods can directly be applied on the whole graph at each 
time step, which suffers from a high computation cost. Second, only learn from the 
incremental subgraph, which leads to bias to the newly coming subgraphs and loses 
the information of the inter-edges. 

In this section, we introduce incremental learning for graphs on the growing 
data. During training, a graph GE with a smaller number of vertices and edge sets 
from the growing graph {Y,} is generated for updating current model, which can 
be implemented by existing GNN methods for specified graph learning task and 
can perform on the entire observed graph at any time. Vertices and edges within 
restricted numbers from the old graph are selected and combined with new subgraph 
into gE. Therefore, gL is unbiased to the entire graph and enough inter-edges are 
preserved. The overview of the IGL is shown in Fig. 6.5. 

To address these issues of subgraph bias and inter-edges missing, the following 
conditions should be considered for generating learning. 


Unbiased Estimation of Neighboring Aggregation To alleviate the bias of subgraph, 
the aggregation results of vertices in GE should be unbiased estimations of them in 
the entire graph, i.e., Vv € %, 


2 (agg (v. MW) NKE) Lv e v) = agg w M O), (6.38) 
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where agg(v, M) is the aggregator function of GNN to aggregate vertex embed- 
dings from .// to v, and .%(v) = {u € % | (u, v) € &] is the neighborhood set of 
vertex v. Thus, 4⁄4 (v) A y represents the sampled neighboring vertices in eL. 


Preservation of Inter-edges Since the missing of inter-edges may seriously affect 
training, we aim at preserving more edges of 4/"'^ in AL, which can be 
formulated as 


max (AC n ger], 
A&L 
f (6.39) 


s.t. [AG < Emax. 


The edge preservation can be required as a definite optimization problem in 
Eq. (6.39) or sampling problem with priority to vertices with higher degrees so that 
P(u € Y.) œ |((u, v) e mer | v e ye" y, 


IGL is based on the unbiased and edge-preserved conditions. In the presentation 
of method, we follow the memory constraint Vmax and set Emax = (|77"*"| + 
Vmax)? — |&/"'"^| by default. The generated edges can be uniformly sampled if 
a smaller Emax is required. The sample-based strategy is presented to select a 
subgraph from the previous graph for learning. The following cluster-based strategy 
is presented to construct a cluster graph that satisfies both the unbiased and edge- 
preserved conditions in midway. The presented strategies are illustrated in Fig. 6.6. 


(1) Sample-Based Strategy 

The strategy of sampling a representative subgraph from previous data based on the 
required conditions is studied first. We assume that a subset Ay from %_1 in size 
of Vinax is sampled, and all the related edges are preserved, i.e., 


Aye = Sample (4-1, Vmax), 
(6.40) 
Ag? = ((u, v) € p | uc AY, v € pu 


Fig. 6.6 An illustration of the sample-based and cluster-based strategies 
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where Sample() denotes the sampling function. Considering the required condi- 
tions, we explore the following pragmatic methods for appropriate sampling: 


e Random selection is for the unbiased condition that uniformly selects Vmax 
vertices from “%_1. However, it cannot preserve enough edges for efficient 
training, especially in sparse graphs. 

* Random jump is a traversal-based sampling, and we adapt it in the following 
steps. Starting out with any vertex in 7/"*". we either randomly walk to a 
neighboring vertex in %_1 with probability p and select it, or randomly jump 
to a vertex in 7/"*" with probability (1 — p). We repeat to fill the sampled set. It 
has been proved that the probability of sampling a vertex tends to be proportional 
to its degree, which works under the edge-preserved condition. 

* Degree-based selection is for the edge-preserved condition that samples vertices 


with priority to those connected with more inter-edges. Let D;(u) = |((u, v) € 
ó;]| be the degree of u, and we define D7*" (u) = Pi MAU , Vu € 7;.., as 


the new degree of vertices to measure their closeness to the new subgraph through 
inter-edges. We then select top-Vinagx vertices in 7;..; by their new degrees. 


The above methods take into consideration only part of required conditions. It 
can be proved that, ignoring the ideal case when all the vertices in 7;..; connect 
with the same number of vertices in 7/"*". sampling in Eq. (6.40) satisfies the two 
required conditions when all the vertices have been sampled, i.e., joint training. 


(2) Cluster-Based Strategy 

The sample-based strategy selects a subgraph from the previous graph for learning. 
However, in such a process, 4,.., is not completely covered, and some important 
vertices might be dropped. Then, the selected subgraph cannot perform full 
communication with the new subgraph. The assumption of sampling that GE must 
be a subgraph of Y is relaxed, and a cluster graph is constructed. Technically, we 
first arrange vertices in %_, into K cluster sets fee ı with centers {ei 1 
in average values of clusters. We set the number of clusters K = Vmax. The cluster 
graph is therefore defined as 


AX e qui eat hs 
AGE = (ei, v)|ve yew Jue em. (u,v) € giner U (6.41) 


t—1 „t-l t—1 t—1 
l(ej .c; )|du e $7 ,u Eg, (uy, w) € 61h, 


which suggests that the cluster centers be added as new cluster vertices, and the 
edges connecting to any vertex in %_, be directly transferred to the corresponding 
cluster vertex. It is noted that the additional edge sets in Eq. (6.41) represent &/"'*" 
and &}_ 1, respectively. 


Due to the continued growth of the graph, direct clustering on the entire graph is 
time-consuming. For an approximate but efficient clustering with a balanced size, 
we first conduct clustering on the new vertices 7/"*" into cluster sets (Av; js 1 
with centers iE y. The bipartite matching algorithm is applied to optimize a 
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bijective matching function M(-) : {1,...,K} — {1,..., K} for the objective: 
miny;(.) EK qe x NE which assigns new clusters to be closer with old 
clusters. Then, we merge the clusters as & = 7 E AY q and update the value 
of centers c}. 

In a word, incremental graph learning (IGL) is a general framework for efficient 
learning on growing graphs in an incremental manner, which has the following 
advantages. First, IGL is more suitable in real-world applications, since the dynamic 
graphs are commonly appeared. Second, the sample-based and cluster-based strate- 
gies significantly improve the efficiency when the large scale graph grows. However, 
only the incremental of the nodes and edges are considered, while the deletion are 
ignored, which limits the application of the method. The general dynamic patterns 
are worth studying in the future works. 


6.5 Summary 


In this chapter, we introduce hypergraph structure evolution methods, i.e., hyper- 
edge weight optimization, vertex weight optimization, sub-hypergraph weight 
optimization, dynamic hypergraph learning, and the techniques for incremental 
learning on growing graphs. The hyperedge weight optimization adjusts weights 
of each hyperedge for different contributions, while the vertex weight optimization 
considers the different importance of vertices on hypergraph. The sub-hypergraph 
weight optimization method further combines multiple hypergraphs for multi-modal 
data with learned weights. Dynamic hypergraph learning optimizes the hypergraph 
structure by modifying the inappropriate connections, which can partially solve 
the missing and incorrect connection issue. Finally, we introduce the incremental 
learning method on growing graphs, which can update the data structure under the 
incremental scenario. 

It is noted that the optimization of hypergraph, either component or the structure, 
will bring in extra computational cost and lead to potentially high computation 
complexity in practice. How to effectively and efficiently adjust the hypergraph 
structure is still a challenging problem, which requires further investigation in 
future. 
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Chapter 7 A 
Neural Networks on Hypergraph PEEM 


Abstract With the development of deep learning on high-order correlations, 
hypergraph neural networks have received much attention in recent years. Generally, 
the neural networks on hypergraph can be divided into two categories, including the 
spectral-based methods and the spatial-based methods. For the spectral-based meth- 
ods, the convolution operation is formulated in the spectral domain of graph, and we 
introduce the typical spectral-based methods, including hypergraph neural networks 
(HGNN), hypergraph convolution with attention (Hyper-Atten), and hyperbolic 
hypergraph neural network (HHGNN), which extend hypergraph computation to 
hyperbolic spaces beyond the Euclidean space. For the spatial-based methods, the 
convolution operation is defined in groups of spatially close vertices. We then 
present spatial-based hypergraph neural networks of the general hypergraph neural 
networks (HGNN-) and the dynamic hypergraph neural networks (DHGNN). Addi- 
tionally, there are several convolution methods that attempt to reduce the hypergraph 
structure to the graph structure, so that the existing graph convolution methods 
can be directly deployed. Lastly, we analyze the association and comparison 
between hypergraph and graph in the two areas described above (spectral-based, 
spatial-based), further demonstrating the ability and advantages of hypergraph on 
constructing and computing higher-order correlations in the data. 


7.1 Introduction 


Hypergraph has demonstrated its ability to model and learn complex correlations in 
recent years. Zhou et al. [1] introduced the hypergraph learning, which conducts 
transductive learning and propagates information on the hypergraph structure. 
Transductive inference on the hypergraph aims to minimize the label difference 
between vertices with stronger connections. There has been extensive development 
and application of hypergraph learning in several fields over the past few years. 

In addition, hypergraph has been investigated in deep learning applications. 
Based on the hypergraph Laplacian and the Chebyshev formula, Feng et al. [2] 
first introduced hypergraph neural networks (HGNN). The hypergraph Laplacian is 
synthesized using predictions in Yadati et al. [3], while Bai et al. [4] defined two 
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neural hypergraph operators based on [5, 6]. However, they do not implement high- 
order learning algorithms by introducing only vertex functions, even though they 
construct a simple weighted graph and apply mature graph learning algorithms. A 
lack of powerful tools for expressing hyperstructure and a wealth of graph literature 
motivated the work of [7]. Additionally, recent successes with graph representation 
learning have been achieved by using neural operators (convolution, attention, 
spectral, etc.). Generally, the neural networks on hypergraph can be divided into two 
categories, including the spectral-based methods and the spatial-based methods. 

For the spectral-based methods, Feng et al. [2] introduced the hypergraph 
neural networks (HGNN) for modeling and learning beyond pairwise complex 
correlations. Different from the traditional graph neural networks (GNN), HGNN 
learns its data representation by iteratively propagating the vertex—hyperedge— 
vertex information pattern. Additionally, the hypergraph Laplacian is first approx- 
imated and introduced into the deep hypergraph learning method to speed up 
the learning process. Following [2], Bai et al. [4] developed an attention module 
based on hypergraph convolution patterns (Hyper-Atten). Hyper-Atten introduced 
a hyperedge-vertex attention learning module that adaptively identifies the impor- 
tance of different vertices in a hyperedge, thus revealing the intrinsic correlations 
between vertices. 

Using the spatial methods, Atwood et al. [8] made use of transition matrices 
to determine where vertices are located. The generalization of convolution in 
the spatial domain is achieved using Gaussian mixture models based on local 
path operators. A graph-based attention-based architecture was built in work [9] 
for analyzing vertices on graph using attention mechanism. A dynamic change 
in hypergraph structure was taken into considerations in [6]. The framework 
introduced in [6], which is more versatile than HGNN [2]. A unified hypergraph is 
then constructed by merging the correlations from different modalities/types using 
an adaptive hyperedge grouping strategy. To learn a general data representation for 
various tasks, a hypergraph convolution scheme [6] was performed in the spatial 
domain. 

Hypergraph spectral graph theory [10] has been explored far less in other 
methods. The concept of hypergraph learning was first introduced by Zhou et al. [1], 
where it was presented as a propagation process. The Laplacian matrix, however, 
is equivalent to pairwise operations according to [11]. There have been several 
studies addressing non-pairwise relationships since then, including developing non- 
linear Laplacian operators [12, 13], learning the optimal parameters of hyperedges 
[13, 14], as well as utilizing random walking techniques [10]. Hyperedges can be 
regarded as connectors in these algorithms, which explicitly break the bipartite 
property of hypergraph by focusing on vertices. 

In this chapter, we systematically introduce the above three types of neural 
networks on hypergraph and show the comparison between graph neural networks 
and hypergraph neural networks from both spectral and spatial aspects. Part of the 
work introduced in this chapter has been published in [2, 15, 16]. 
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7.2 Spectral-Based Neural Networks on Hypergraph 


The spectral neural networks methods have attracted much attention since Bruna 
et al. [17] and Kipf et al. [18] simplified them in a graph convolutional network 
pattern. The data are transformed from the common domain to the spectral domain 
to be processed with according to map theory and the convolution theorem, and it 
then gets transformed back to the common domain. In other words, first we convert 
the signal from the common domain to the frequency domain (Fourier transform 
implementation) and then multiply it by the phase. Then, we convert the result 
of the phase multiplication back to the common domain again (Fourier inverse 
transform implementation). We will present spectral-based hypergraph neural net- 
works methods, including hypergraph neural networks (HGNN) [2], hypergraph 
convolution with attention (Hyper-Atten) [3], and hyperbolic hypergraph neural 
networks (HHGNN) [19]. In particular, HHGNN extends hypergraph learning to 
the hyperbolic spaces beyond the Euclidean space. 


7.2.1 Hypergraph Neural Networks 


Given a hypergraph Y = (7,4, A) with N vertices, the hypergraph Laplacian 
A is an N x N positive semi-definite matrix. The orthonormal eigen vectors 
$ = diag(¢1,...,¢@n) and a diagonal matrix A = diag(A1,..., Ay), which 
contains the corresponding non-negative eigenvalues, are obtained by employing 
the eigendecomposition A = $ A9. X = Ø! x defines the Fourier transform for 
a signal x = (x1, ..., xw) in the hypergraph. It is assumed that the eigenvectors 
represent the Fourier bases and the eigenvalues represent the frequencies. The 


spectral convolution of signal x and filter g can be denoted as 
gxx—((b'g) o(Óx)-dog(A)o!x, (7.1) 


where © denotes the element-wise Hadamard product and g(A) = diag(g (A1), ..., 
g(àn)) indicates a function of the Fourier coefficients. However, in the forward and 
inverse Fourier transforms, the computational cost is O(n”), which is high. To solve 
this problem, Defferrard et al. [20] parameterize g(A) with K -order polynomials, 
and one such polynomial is the truncated the Chebyshev expansion. Chebyshev 
polynomials 7j (x) are computed by the formula of Tg (x) = 2xTk—1 (x) — Tr-2(x), 
with 7o(x) = 1 and Ti (x) = x. After that, the g(A) can be computed by 


K 


grx & M OKT (A)x, (7.2) 
k=0 
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where T; (A) denotes the Chebyshev polynomial of order k with scaled Laplacian 
Ac AI n Eq. (7.2), matrix powers, additions, and multiplications are com- 
bined instead of expansive computation of Laplacian Eigen vectors, thus improving 
computation complexity even further. Since that the Laplacian in hypergraph can 
already represent the high-order correlations among nodes, it can further limit the 
order of convolution operation to K — 1. It is suggested by Kipf et al. [18] that 
Amax © 2 for the scale adaptability of neural networks. After that, the convolution 
operation can be simplified to 


g xx © box — 01D; "HWD; !H' D; !?x, (7.3) 


where 69 and 0; represent the parameters of all node filters. In addition, a single 
parameter 0 is used to avoid the overfitting problem, which is defined as 


0; = —40 
ian 1/2 -1/2 (7.4) 
æ = 50D, "HD; '!H'D, 


Thereafter, the convolution process can be simplified to the following function: 


gex e RP a 
7.5 
~ 0D, "HWD; H'D; "x, n 


where (W + I) can be regarded as the weight of the hyperedges. In the initialization 
of W, the hyperedges can be all assigned with equal weights as an identity matrix. 

When having a hypergraph signal X' for the t-th layer, the hyperedge convolution 
layer HGNNConv can be formulated by 


X'*! 2 o(D;'"?HWD;!H D; ?x'o), (7.6) 


where O is the parameter to be learned during the training process. To extract 
features from a hypergraph, the filter O is applied to the vertices. After convolution, 
X'+! which can be used for further processing. 

The framework of the abovementioned HGNN model is shown in Fig. 7.1. 
HGNN is able to address the challenges of learning representations for complex 
data by incorporating such data structures into hypergraph, which are more flexible 
and effectively confronting practical data. 

The HGNN calculation stages are shown in Fig. 7.2, and the three processes are 
directly projected to the functions. We can observe that there are vertex feature 
transform, hyperedge feature gathering, and vertex feature aggregating steps in this 
framework. 
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7.2.2 Hypergraph Convolution and Hypergraph Attention 


Based on the study of hypergraph neural networks [2], Bai et al. [4] introduced 
hypergraph convolution and hypergraph attention (Hyper-Atten) by introducing 
attention mechanism in the framework. 

In this method, an explicit magnitude of importance is assigned to the afferent 
and efferent information flow for non-binary values of the transition probability 
between vertices for a given vertex. However, such an attention mechanism must 
work after the graph structure (the incidence matrix H) is given, instead of learning 
a dynamic incidence matrix. It is easier to reveal the intrinsic relationship between 
vertices using a dynamic transition matrix than by using a fixed incidence matrix. An 
attention learning module could be imposed on H, which does not treat each vertex 
as being connected by a hyperedge or which does not assign non-binary and real 
values when measuring the degree of connectivity. Following [6] when the vertex 
set and the edge set are comparable, the attention score between a given vertex x; 
and its associated hyperedge x; can be written as 


We = exp (c (sim (xiP, x;P))) (7.7) 
7 Deen, exp (0 (sim (x;P, xxP)))' 


where o (-) is a nonlinear activation function. The weight matrix between the (/)-th 
. 1 1+1 y g 

and (/ + 1)-th layers is denoted as P € RFÜxFU Nj is the neighborhood set of 

x;. The pairwise similarity of two vertices is computed with this similarity function 

sim(-): 


sim (xi, xj) =a" [xillx;]. (7.8) 


Operation [, ||, ] indicates concatenation, and notation a is a weight vector for 
outputting a scalar similarity value. 

When following Eq. (7.6) to learn the intermediate embedding of vertices layer 
by layer, hypergraph attention also propagates gradients to H in addition to X? 
and ©. Therefore, Eq. (7.7) means the share of hyperedge x; in the neighbors of 
the vertex x;, which indicates the relative importance x; of x;. More categorical 
embeddings can be learned by the probabilistic model, and the relationship between 
vertices can be described more accurately. 

In order to further enhance the capability of representation learning, the method 
uses hypergraph attention mechanisms based on the basic formulation of performing 
convolutions. 
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7.2.3 Hyperbolic Hypergraph Neural Networks 


The hyperbolic space is a manifold with constant Gaussian negative curvature 
everywhere, which has several models. Similar to [21, 22], the work is based on 
the Poincaré ball model for its well-suited for gradient-based optimization. The 
Poincaré ball model with constant negative curvature —1/k(k > 0) corresponds 
to the Riemannian manifold [pe gi). prk = {x eR": |x|] < 1) is an open 
n-dimensional unit ball, where ||.|| denotes the Euclidean norm. Its metric tensor 
is gE = 228E, where Ax = a is the conformal factor and g^ = I, is 
the Euclidean metric tensor. Then, we define the Möbius addition of two points 
x, y € P" as follows: 


1 + 2k(x, y) + Allyl?) x + (1 — KIxil? 
TC (x, y) + kllyll E ( d Iv (7.9) 
1 + 2k(x, y) + k? IIx yil 


The distance between two points x, y € I^ is calculated by integration of the 
metric tensor, which is given as 


d(x, y) = 2/VR) tanh-! (VE -x 6s yl) . (7.10) 


Here we can denote point z € .%P”* as the tangent (Euclidean) space centered 
at any point x in the hyperbolic space. For the tangent vector z 4 0 and the point 
y Æ 0, the exponential map exp, : %P"* — TP" and the logarithmic map log, : 
D" —, Zp are given for y Z x by 


k 
— »( pM) Z ) 23 
expy(Z) = x @x (as Vk ART (7.11) 


and 


k 2 
logy(y) = Jie 
x 


_ —X Qi y 
tanh“! ( Vk ||-x 4 E 7.12 
am^! (VtlI-xe yl) — (7.12) 


The transformation between the tangent space and the hyperbolic space is shown 
in Fig. 7.3. Leverage the operations of exp and log maps, so that we can use the 
tangent space “%P to perform transformations such as convolution and activation 
in Euclidean space. In the convolution, vertex information is first gathered to the 
hyperedge for storage, and then each vertex aggregates the information of the 
connected hyperedge. 

It is noted that initial data are on the Euclidean space and need to be converted 
into embeddings on the hyperbolic space, so then first project the data on the 
previously obtained Euclidean space onto the hyperbolic manifold space in order 
to use the spectral-based hypergraph hyperbolic convolutional network to learn the 
information to update the node embeddings. Set t := {/k, 0,0,...,0} € pak 
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Fig. 7.3 The transformation between the tangent space and the hyperbolic space 


as a reference point to perform tangent space operations. The above condition 
makes ((0, x0 E), t) = 0 hold, so (0, x E) can be regarded as the initial embedding 
representation of the hypergraph structure on the tangent plane of the hyperbolic 
manifold space Z; I^^. The initial hypergraph structure embedding is then mapped 
onto the hyperbolic manifold space P following [19]: 


x™P = expt ((0, x9) 


= (vico (1735) VEsinn (Pl) xo ) (7:13) 
vk J? Vk J xol P7 


Unlike the previous study [23] that simply generates the hyperedge structure for 
common domain convolution, combined with the inspiration provided by HGNN 
[2], hypergraph computation from the perspective of spectral convolution can be 
conducted. 

Given hyperbolic curvatures —1/ke_1, —1/ ke at layers £ — 1 and £, respectively, 
then the hyperbolic hypergraph convolution of the hypergraph input signal x” with 
filter g can be defined as 


xP x g= expe (o ((e" (log? (x))) (eg) 


O 
= exp“ CE (logi (x))) , 


(7.14) 
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where © is the element-wise product, g(A) = diag(@), and 0 = [6),--- , 0,] is the 
parameters to be learned. Leverage the operations of exp and log maps, so that the 
tangent space .%P4* can be used to perform Euclidean transformations. It operates 
in the tangent space of each center point x” because the Euclidean approximation is 
best [19]. 

Considering the high computational complexity of the Fourier transform and 
inverse Fourier transform, this convolution method is very expensive to calculate. 
Convolutions can be computed more efficiently by truncating Chebyshev polyno- 
mials as [2]. It can be simply expressed as 


x” x g a expt! (oD, 'HWD, 'H' p; "(logs (47))). (7.15) 


where W is the initial weight of hyperedges. The above equation uses the hyper- 
graph Laplacian matrix to calculate the total gain obtained after a small perturbation 
of a point. For a hypergraph with n vertices, the convolution layer can be denoted as 
following formulation: 


x! = exp", (o (A (Iogi (EIE) 9)) (7.16) 


where © € R*t-5**0 is the parameter to be learned during the training process, 
which is applied over the vertices in the hypergraph to extract features. c indicates 
the size of the embedding dimension, o denotes the nonlinear activation function, 
and A = D; "HWD; H' D; '?. 

The hyperbolic operation is accomplished by conducting feature mapping 
between the Euclidean space and the hyperbolic space. The framework of the 
above spectral-based hyperbolic hypergraph convolution is shown in Fig. 7.4. 
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Fig. 7.4 The framework of the spectral-based hyperbolic hypergraph convolution method 
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7.3 Spatial-Based Neural Networks on Hypergraph 


To show the spatial-based neural networks on hypergraph, we first briefly review 
the definition of spatial-based graph convolution. The processing on an image is 
taken as an example. The pixel in an image can be represented as vertices in a grid 
graph, where each vertex only connects its neighbor vertices in the spatial-closed 
region where it is located. A C-channel feature can be accordingly generated for 
each vertex (pixel) in the image. The process of filtering an image can be viewed 
as an average aggregation of neighbors' features after a central vertex transforms 
their features. Similar to convolution neural networks in image processing, spatial- 
based graph convolution combines the neighbors of the central vertex to produce 
a new representation. Spatial-based graph convolution runs from neighbor vertices 
to center vertices, which is similar to the definition of a path in a simple graph. A 
path in graph is defined as P (vij, vy) = (vi, v2,..., vy). Vertices in the sequence 
are adjacent to each other, so that every vertex in the sequence is adjacent to every 
other vertex. It means that all the vertex pairs of i and i + 1 (1 € i < k — 1) have 
the neighbor relation. 

Similar to the spatial-based graph convolution, spatial-based hypergraph neural 
networks also consider the neighbor information when learning representation. 
Following, we introduce two typical spatial-based hypergraph neural networks, 
including general hypergraph neural networks (HGNNT ) [16] and dynamic hyper- 
graph neural networks (DHGNN) [15]. 


7.3.1 General Hypergraph Neural Networks 


In this part, the general framework [16] for modeling representation learning 
using hypergraph neural networks on given raw data is introduced. Figure 7.5 
demonstrates the framework of general hypergraph neural networks, which also 
consists of two procedures, i.e., hypergraph modeling and hypergraph convolution. 
In the hypergraph modeling step, data issued to generate the high-order correlations, 
which are represented as a hypergraph. Similar to previous tasks, hyperedge 
groups can be generated as pairwise edges, k-Hop, and neighbors in the feature 
space. As a result of this procedure, all types of hyperedge groups (if they are 
available) are generated and concatenated in a hypergraph for the purpose of 
data correlations modeling. Hypergraph convolution is the process of creating a 
set of hypergraph convolutions on a given set of hypergraph, i.e., the spectral- 
based convolution and the spatial-based hypergraph convolution for representation 
learning on hypergraph. As a result of these convolution procedures, they can 
generate much more accurate representations of multi-modal data and high-order 
correlations using this information. 
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(1) Hypergraph Modeling 

The first step is to construct a flexible hypergraph from raw data if there is no 
hypergraph existed, and the data correlations can be modeled using a hypergraph 
structure. The ability to generate a suitable hypergraph structure is critical to exploit 
the high-order correlations among the data. Generally, hypergraph structures are 
not explicit in most cases. Therefore, different strategies are needed to generate the 
hypergraph. Hypergraph generation from scratch usually involves a combination 
of three scenarios, namely, data with graph structure, data without graph struc- 
ture, and data with multi-modal/multi-type representations. Hyperedge generation 
strategies, which employ pairwise edges, k-Hop, and neighbors in the feature space, 
respectively, are introduced here. The strategies of using pairwise edges and k-Hop 
are utilized for hyperedge group generation from the data with a graph structure, 
and those of using neighbors in feature space are employed for hyperedge group 
generation from the data without graph structure. Finally, all the hyperedge groups 
are further concatenated to generate the overall hypergraph. 

The above strategies can be used here to generate a number of hyperedge groups. 
A final hypergraph is then generated by further combining generated or natural 
hyperedge groups. Supposing there are K hyperedge groups (61,65, ..., 6k], K 
indicates incidence matrices Hy € (0, 1] * V, respectively. For the hypergraph 4, 
the simplest fusion way to construct the incidence matrix is directly concatenating 
all the hyperedge groups as H = Hy||Hb»||---||Hx, where -||- is the matrix 
concatenation operation. These hyperedges weight matrices of hypergraph can be 
assigned a value of 1 in order to treat them equally. This simplest fusion way can be 
called as coequal fusion. 

It is noted that other combination strategies can be also used according to 
different application scenarios. As the multi-modal hybrid high-order correlations 
cannot be fully exploited by a simple coequal fusion, due to differences in 
information richness between hyperedge groups, an adaptive strategy for the fusion 
of hyperedge groups, namely Adaptive Fusion, was introduced in [16]. Specifically, 
each hyperedge group is associated with a trainable parameter that can be used to 
adjust the effect of multiple hyperedge groups on the final vertex embedding in an 
adaptive manner, which can be defined as 


wy = copy(sigmoid(wy), Mx) 
W =diag(wi,...,wi',..., wh... we) , (7.17) 
H = H,||H)||---||Hx 


where wy € R is a trainable parameter that is shared by all hyperedges inside a 
specified hyperedge group k. sigmoid(-) is an element-wise normalization function. 
Wk = (wh, ee WP) € R™« denotes the generated weight vector for hyperedge 
group k. copy(a, b) function returns a vector of size b, and the value of which is 
padded by copying a by b times. Let M = Mı + M2 +.---+ My denote the 
summation of the hyperedges in all hyperedge groups. W € R“*™ is a diagonal 
matrix that indicates the weight matrix of hypergraph, and each entry WË denotes 
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the weight of the corresponding hyperedge e;. By concatenating (-||-) the incidence 
matrices of multiple hyperedge groups, H € (0, 1)" *" can denote the incidence 
matrix of the hypergraph generated. 

Multi-model/multi-type data can be analyzed to generate multiple hyperedge 
groups. From the constructed hyperedge groups, the hypergraph incidence matrix 
H and hyperedge weight matrix W can be generated, which can then be fed into the 
hypergraph convolution layer for further processing. 


(2) Hypergraph Convolution 

Following Definitions 1, 2, 3, an aggregation of neighbor vertex messages via 
hyperpath is introduced for one spatial hypergraph convolution layer. Given a vertex 
a € Y of hypergraph 4 = (Y^, &, W}, aggregating messages from its hyperedge 
inter-neighbor set JJ; (o) is the aim. In order to obtain those hyperedge messages of 
each hyperedge f in the hyperedge inter-neighbor set 4 (o), aggregating messages 
from its vertex inter-neighbor set .%(8). After that, the two steps of hypergraph 
convolution make a closed loop from vertex feature sets X’ to X'*!. A general 
spatial hypergraph convolution in the t-th layer can be defined as 


; (7.18) 


where x, € X' denotes the input feature vector of vertex a € % in layer 
t — 1,2,...,T, and gp denotes the updated feature of vertex œ. m^, denotes 
the message of hyperedge B € 4, and wg denotes a weight associated to 
hyperedge $. mi denotes the message of vertex o. Vp denotes the hyperedge 
feature of hyperedge f that denotes an element of hyperedge feature set Y' = 
{yt y... Yu} yi € RC in layer t. M!(), ULC), MIO) ULC) are the vertex 
message function, hyperedge update functions, hyperedge message function, and 
vertex update function in fj; layer, respectively, which can be defined for specified 
applications. 


With the high-order relationship in the hypergraph structure, the spatial hyper- 
graph convolution layer is designed for high-level representation learning. In 
comparison with the graph convolution that consists of a single stage of message 
passing, the spatial hypergraph convolution is composed of four flexible operations 
with learned differentiable functions. As neighbor relations in graph, there is no 
natural ordering in inter-neighbors between vertices and hyperedges. Therefore, 
a summation operation is used to aggregate vertex-hyperedge messages from 
M! (-)/M}(-) operation. 

A simple spatial hypergraph convolution layer (named HGNNConv™) via spec- 
ifying the message-update functions (vertex message function M! (-), hyperedge 
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update function U:(-), hyperedge message function M‘(-), and vertex update 
function U! (-)) is introduced as 


t 


Mik) = ux 
Uz (wg, mg) = wg : m% 

| A l (7.19) 
MiG. Y) = Taw 


Ui (xh mg) = o (mgt - 9!) 


where O^ e RC'*C'" denotes a trainable parameter of layer t, learned in training 
phase. o (-) denotes an arbitrary nonlinear activation function such as ReLU (-), etc. 
Note that in Eq. (7.19), x2/|//; (B)| and Vp /|-4%e(a)| denote the normalized vertex— 
hyperedge feature, of which convergence is accumulated and jittering is somewhat 
minimized. 

For faster forward propagation of HGNNConv* in GPU/CPU devices, here 
rewrite it in the matrix format. Consider X' as the input vertex feature set of 
layer t. From Definitions 1, 2, H' € (0, 1)4* can control the hyperedge inter- 
neighbor of each vertex feature in X'. Hence, it can be used to guide each vertex 
to aggregate and generate the hyperedge feature set Y‘, which can be formulated as 
Y' = WD; !H' X'. Ina similar way, the process of updating vertex feature set X^*! 
from hyperedge feature set Y' can be formulated as X'+! = o (D; ! HY' Oʻ). Thus, 
the matrix format of HGNNConv* can be written as 


X'*! = c(D;! HWD;H'x'o/). (7.20) 


Similar to HGNN, X'+! can be obtained after convolution, which can be used 
for further learning. As an extension of HGNN [2], this method employs a broad 
multi-modal/multi-type data correlation model to learn an adaptive weight for each 
modality/type representation using a single hypergraph model. 


7.3.0 Dynamic Hypergraph Neural Networks 


Dynamic hypergraph neural networks (DHGNN) [15] is a kind of neural networks 
modeling dynamically evolving hypergraph structures, which is composed of the 
stacked layers of two modules: dynamic hypergraph construction and hypergraph 
convolution. The dynamic hypergraph construction module dynamically updates 
hypergraph structures on each layer as initially constructed hypergraph may not 
be an appropriate representation for data. After that, hypergraph convolution is 
introduced as a means of encoding high-order correlations between data points 
within a hypergraph. There are two phases in the hypergraph convolution module: 
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vertex convolution and hyperedge convolution, each of which is designed to 
aggregate features among vertices and hyperedges, respectively. 


(1) Dynamic Hypergraph Construction 

Symbol Con(e) is used to denote the vertex set that a hyperedge e contains, and the 
symbol Adj(v) is used to denote the hyperedge set where all hyperedges containing 
the vertex v: 


Con(e) = (vi, v2,..., vz}, (7.21) 
Adj(v) = {e1, e2,..., ex,} (7.22) 


where ke and k, are the number of vertices in hyperedge e, and the number of 
hyperedges containing vertex v. v is defined as the centroid vertex of the hyperedge 
set Adj(v). Here, traditional k-NN methods and k-means clustering methods can 
be combined for dynamic hypergraph construction to exploit local and global 
structures. On the one hand, it has computed the k-1 nearest neighbors for each 
vertex v. These neighborhood vertices, along with the vertex v, form a hyperedge in 
Adj(v). On the other hand, it has conducted k-means algorithm on the whole feature 
map of each layer according to the Euclidean distance. For each vertex, the nearest 
S — 1 clusters are assigned as to be the adjacent hyperedges of this vertex. Here, 
|Adj (v)| denotes the size of adjacent hyperedge set, x, denotes adjacent hyperedge 
features, and x, denotes centroid vertex feature. W and b are learnable parameters. 
Such a procedure on the feature embedding of each layer is performed. Espe- 
cially, it initializes hypergraph structures with the input feature embedding. There- 
fore, the hyperedge set is dynamically adjusted as the feature embedding evolves 
with network going deeper. In this way, it is able to obtain better hypergraph 
structures for high-order data correlation modeling with deep neural networks. 


(2) Dynamic Hypergraph Convolution 

Hypergraph convolution is composed of two sub-modules: vertex convolution sub- 
module and hyperedge convolution sub-module. By using vertex convolution, vertex 
features are aggregated to the hyperedge, and then by using hyperedge convolution, 
adjacent hyperedge features are aggregated to the center vertex. 

There are several methods of pooling that can be used, including maximum pool- 
ing and average pooling. Vertex aggregation in state-of-the-art algorithms involves a 
fixed, pre-computed transform matrix generated from graph or hypergraph structure. 
Nevertheless, such methods cannot effectively model discriminative information 
among vertex features. For feature permutation and weighting, learn the transform 
matrix T from the vertex features. Information can flow within and between 
channels using the transform matrix. Using multi-layer perception (MLP), obtain 
the transform matrix T and compress the transformed features by using convolution 
as follows: 


T = MLP(X,) (7.23) 
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and 
Xe = conv (T: MLP (X,)). (7.24) 


(3) Hyperedge Convolution 

Here, the hyperedge convolution is following the spatial convolution strategy, 
which consists of the aggregation of hyperedge features to center vertex features. 
Hyperedge convolution employs multi-layer perception to generate weight scores 
for each hyperedge. As a weighted sum of input hyperedge features, the center 
vertex feature is computed as an output. This procedure can be formulated as 
follows: 


w = softmax (x, W + b) (7.25) 
and 
Adi] — 
Xue J wxi. (7.26) 
i=0 


As a result of these deep learning techniques, graph/hypergraph structure is 
taken into consideration as prior knowledge to the training of the model. There 
are, however, a number of hidden and important relationships that are not directly 
represented in the inherent structure. For vertex convolution, a transform matrix 
is employed to permute and weight vertices within hyperedges; for hyperedge 
convolution, an attention mechanism is employed to aggregate adjacent hyperedge 
features. Figure 7.6 shows the architecture of the DHGNN. The first part of 
the figure illustrates the process of the hyperedge construction. There are two 
hyperedges generated from two clusters (dashed ellipses), for example. In the 
second part, vertices within a hyperedge are aggregated to form a hyperedge 
feature through vertex convolution, and vertices within adjacent hyperedges are 
aggregated to form a center vertex feature via hyperedge convolution. In the third 
part, after performing such operations on all vertices in the current layer feature 
embedding, the new layer feature embedding and the new hypergraph structure can 
be constructed. 


7.4 Comparison Between Graph and Hypergraph Neural 
Networks 


After the previous introduction to the spectral-based and spatial-based hypergraph 
neural networks methods, we have a basic understanding of the implementation 
of these methods. In this section, we compare hypergraph neural networks with 
simple graph neural networks according to spectral and spatial areas to discover the 
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connections and differences between them. The most typical methods of the two 
neural networks are chosen, the hypergraph neural networks model and the graph 
neural networks model, as a way to compare the most typical relationships and 
differences. HGNN [2] and HGNNT [16] are used to compare them in the spectral 
and spatial domains, respectively. In terms of convolution, GNN is the classical 
operator designed to operate on graph, such as [6, 18, 24, 25]. In this subsection, 
the HGNN [2] and HGNNT [16] are compared with GNN [18] from the spectral 
perspective and spatial perspective, respectively. Furthermore, the extended learning 
domain of the hypergraph emphasizes the connection. 


7.4.1 Spectral Perspective 


It can be proved that the GNN can be mathematically viewed as a special case of 
HGNN. Based on the assumption that every hyperedge connects only two nodes and 
has a weight equal to that of others, the simple hypergraph (2-uniform hypergraph) 
can also be expressed as a graph that has a graph adjacency matrix A and a vertex 
degree matrix D, which is a construction similar to Gpair. It is indicated by the 
hypergraph incidence matrix H, the vertex degree matrix D,, the hyperedge degree 
matrix D,, and the hyperedge weight matrix W. Under such circumstances, then the 
following formulations can reduce the simple hypergraph: 


HH' =A+D 
D! =H . (7.27) 
Ww =I 


This can be reduced as follows using the hypergraph convolution: 


X+ 2 oD,” HWD,'H'D, "x!'o') 


= o (D; HGDH'D; X0’) 

= o (4D7!/?(A -D)D-!2X'8!) , (7.28) 

= o (4 + D7!2AD-!2)xtg!) 

= o (D7!/2ÂD-!/2X' Ô’) 
where Â = I + D7!/2AD7!/? and Ô' = 50". The extra T can be absorbed by the 
learnable parameter ©. It appears that in modeling the simple graph, the spectral- 
based hypergraph convolution in HGNN [2] exhibits the same formation as the 
graph convolution in GCN [18]. Due to its powerful expressive capabilities, the 


hypergraph convolution not only models and learns the high-order correlation in the 
hypergraph, but also it has the ability to handle simple graph. 
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7.4.2 Spatial Perspective 


Learning to embed the rooted subtree in low-dimensional space can be viewed 
as a powerful GNN model [26]. Not only can rooted subtree [27] describe the 
connections of local vertices, but it can also describe message passing paths in a 
graph. The rooted subtree can therefore be used to compare HGNNT [16] with 
GNN [18]. In hypergraph, the node in the rooted subtree of hypergraph can either 
be a vertex or a hyperedge in order to satisfy the path definition (also known as the 
message passing path). 

Comparing graph structures that are isomorphic is more straightforward. There- 
fore, 2-uniform hypergraph (each hyperedge connects only two vertices) is com- 
pared. Figure 7.7 displays the rooted subtree for HGNN* [16] and GNN [18] 
for a specified vertex, which can also be expressed as the message path in graph 
and hyperpath in hypergraph. It is obvious that in graph convolution, the vertex 
features of the neighbors are taken into account. These features are then aggregated 
to update the central vertex feature at the end of the process. This layer can be 
described as a hierarchical structure that enables the development of more powerful 
expressions and modeling capabilities. HGNN* [16] performs a two stage, i.e., 
vertex-hyperedge-vertex, transformation. As formulated in Eq. (7.18), the first 
stage of the procedure generates a hyperedge feature based on the vertex inter- 
neighboring of the vertex. As a result, the hyperedge inter-neighbor’s features 
are aggregated to get the updated features of the vertices. Additionally, multi- 
layer hypergraph convolution has much more message interactions than graph 
convolution. The rooted vertex appears more frequently in the HGNNT [16] path 
of subtrees (like a latent extra self-loop), which accounts for its better performance. 
In comparison with graph convolution, hypergraph convolution can efficiently 
extract low- and high-order correlations on hypergraph via vertex-hyperedge-vertex 
transformation. 


7.5 Summary 


In this chapter, we introduce two types of hypergraph neural networks learning: 
spectral-based and spatial-based methods. In spectral-based methods, the hyper- 
graph transforms the nodes in the common and spectral domains by computing the 
Laplacian matrix. In the spatial-based methods, each node is updated by aggregating 
information from the nodes on the spatial domain. Then, we consider that most 
learning methods in graph learning are still simple graph neural networks. 

Finally, we also compare hypergraph neural networks and graph neural networks 
on the previous spectral-based spatial-based and others. According to the compari- 
son of the convolutional computation coefficients, the hypergraph convolution can 
not only have the comparable expressive ability of GCN when handling a simple 
graph, but also is capable of modeling and learning high-order correlations within 
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the hypergraph. Comparing hypergraph convolution with graph convolution based 
on spatial domain comparison, we can find that hypergraph convolution layer can 
efficiently extract both low-order and high-order correlations on hypergraph using 
the vertex-hyperedge-vertex transformation. 
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Chapter 8 A) 
Large Scale Hypergraph Computation om 


Abstract As introduced in the previous chapters, the complexity of hypergraph 
computation is relatively high. In practical applications, the hypergraph may not 
be in a small scale, where we often encounter the scenario that the size of 
the hypergraph is very large. Therefore, hypergraph computation confronts the 
complexity issues in many applications. Therefore, how to handle large scale data is 
an important task. In this chapter, we discuss the computation methods for large 
scale hypergraphs and their applications. Two types of hypergraph computation 
methods are provided to handle large scale data, namely the factorization-based 
hypergraph reduction method and hierarchical hypergraph learning method. In 
the factorization-based hypergraph reduction method, the large scale hypergraph 
incidence matrix is reduced to two low-dimensional matrices. The computing 
procedures are conducted with the reduced matrices. This method can support 
the hypergraph computation with more than 10,000 vertices and hyperedges. On 
the other hand, the hierarchical hypergraph learning method splits all samples as 
some sub-hypergraphs and merges the results obtained from each sub-hypergraph 
computation. This method can support hypergraph computation with millions of 
vertices and hyperedges. 


8.1 Introduction 


Hypergraph computation has been used in many areas such as image analysis [1—3] 
and recommendation [4—6]. In practical applications, the hypergraph may not be in a 
small scale, and the size of the hypergraph could be very large in many cases, where 
hypergraph computation confronts the complexity issues [7—13]. For instance, in 
medical image analysis, hypergraphs can be used to model the relationship among 
case patches within an image or different images. Here we take the gigapixel whole- 
slide histopathological images (WSIs) as an example. The large scale of pixels 
in WSIs leads to a great challenge for medical image analysis. If we generate a 
hypergraph for such pixels in WSIs, the scale of vertices tends to be in billion 
level. Even we sample patches in WSIs, this number can be still around tens of 
thousands, or in million level. The conventional hypergraph modeling methods are 
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highly unlikely able to analyze such large scale pixels. Another example is the 
recommender system. In recommender system, graphs or hypergraphs have been 
very widely used with their superior structural modeling capabilities. Meanwhile, 
the number of uses and items in the Internet or the recommender systems can be 
in million to billions level, and even keep increasing. Consequently, recommender 
systems are one of the typical playgrounds for large scale hypergraph applications. 
The large scale problem of hypergraphs is encountered in many other areas, such as 
social network analysis, protein relations prediction, and so on. 

Under such circumstances, hypergraph computation confronts the large scale 
issue, as the modeling and computing on hypergraph are with high complexity 
in general. To help solve the large scale problem, we introduce two types of 
hypergraph computational methods to handle large scale data in this chapter, namely 
the factorization-based hypergraph reduction method and hierarchical hypergraph 
computation method. We also introduce their applications in medical image anal- 
ysis and recommender systems, respectively. The factorization-based hypergraph 
reduction reduces the large scale hypergraph incidence matrix H to two low- 
dimensional matrices, leading to the reduction of the complexity. This method can 
support the hypergraph computation with tens of thousands vertices. The other 
method, i.e., the hierarchical hypergraph computation, splits the vertices to several 
subsets and computes each sub-hypergraph, respectively. The results from these sub- 
hypergraphs can be further combined following a hierarchical strategy. This method 
can support the hypergraph computation with millions of vertices and hyperedges. 
Part of the work introduced in this chapter has been published in [8]. 


8.2 Factorization-Based Big-Hypergraph Modeling 


The complexity of the incident matrix H € RYE is @(N*), which rises rapidly 
with respect to the increasing of the number of vertices (|W | = N) and the number 
of hyperedges (|6| = N). Although hypergraphs can model high-order complex 
associations well, the incidence matrix cannot take up a sizable number of vertices 
in traditional hypergraph modeling and transductive computation strategy. This is 
one typical bottleneck that limits the applications of hypergraph computation. To 
address this problem, the factorization-based hypergraph reduction method [8] is 
introduced to handle hypergraph modeling and computing with tens of thousands 
vertices. 

It is an effective way to reduce dimensionality by conducting matrix decompo- 
sition of matrices with high dimensionality into the product of matrices with small 
dimensionality and has been applied in different areas such as spectral clustering 
[14] and recommendation algorithms [15]. For a large-dimensional incidence matrix 
H for a hypergraph, matrix decomposition can also be used to find the low- 
dimensional embeddings of each vertex and hyperedge and support large scale 
hypergraph computation. 
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Fig. 8.1 The pipeline of the factorization-based hypergraph reduction method. This figure is from 
[8] 


As illustrated in Fig. 8.1, the factorization-based hypergraph reduction incor- 
porates a factor embedding component that encodes the relationships between 
hyperedges and vertices into two latent semantic spaces. Due to the low dimension 
of the latent semantic space, it can handle more vertices and hyperedges accordingly. 

The purpose of factorization is to reduce the dimension of the incident matrix 
H to two semantic spaces, including vertex-belonging hyperedge represented by 
H,ee, € RY”? and hyperedge-containing-vertices represented by H.5y, € gE*v, 
where &, and % represent the hyperedge set containing vertex v and vertex set 
in hyperedge e, respectively, and 9 is a hyperparameter that represents the latent 
semantic space dimension. Figure 8.1 illustrates that the two latent semantic spaces 
aim to express all connections between vertices and hyperedges. This procedure is 
formulated as below: 


arg min [IH = maus BT ig]. (8.1) 


eDVe 
veóy He»y; d 


Consequently, the corresponding loss generated by the hypergraph dimensional- 
ity reduction can be written as 


£y, = |H — Wee, HI, |l. (8.2) 


The hypergraph Laplacian matrix L is another crucial component of hypergraph 
computation, with the ordinary form is L = I — D, ?^HWD; !H' D; / ? Since 
the incident matrix H has two low-dimensional latent semantic spaces, the low- 


dimensional hypergraph factorization-based Laplacian Lp is formulated as 
Lr = I— D; '?H,cc, Hoy, WD; H,5y, Heg D; ^, (8.3) 
———— 
Xemexe 


where X = H5, WD; H.-y, is an intermediate latent feature multiplication term 
of dimension g. Because the latent semantic space dimension ¢ is significantly 
smaller than the total amount of vertices and hyperedges, the multiplication 
intermediate term X functions as an extended control coefficient matrix. 


148 8 Large Scale Hypergraph Computation 


3) g b) am 3 r 
83 P1 AY Had 
4:9 ^ 

Wan T ^ 7 

Whole suadhnage Local Visual Features Relation in Multiple Space 


Fig. 8.2 (a) The whole-slide image for survival prediction; (b) Local feature extraction with con- 
volution networks; (c) Feature aggregation with pairwise relation; (d) Global feature representation 
with high-order relation and multiple spaces. This figure is from [1] 


The factorization-based hypergraph reduction can be used in hypergraph neural 
networks to support large scale computation, which can be used for more than 
10,000 vertices and hyperedges. 

Here we illustrate an application of hypergraph computation for large scale 
medical image analysis using whole-slide histopathology images for survival pre- 
diction. The goal is to make predictions by extracting valid survival-specific features 
reflecting the survival status of a patient based on a whole section histopathology 
image. Unlike conventional images, WSI data can be very large, i.e., a single image 
may have billions of pixels, and the correlations of these data are very complicated. 
Therefore, hypergraph computation in this application meets the large scale issue. 
The existing medical image analysis models are designed for analyzing natural 
images with a much smaller size, such as 256px x 256px or more. In order for the 
model to handle these WSI data, a number of patches of a moderate size are usually 
sampled first. Some patches of a moderate size (e.g., 256 x 256) are extracted from 
each WSI, and then these patches are stacked up and fed into a CNN-based feature 
extractor (e.g., VGG) to generate a global representation, as shown in Fig. 8.2. 
Subsequently, a regression model is applied to the global features to predict the 
survival score. These methods have an obvious drawback that the structure of the 
entire histopathological image is broken into pieces by patch sampling. 

It may be unrealistic to extract all of the structural information at the cellular 
level from gigapixel images because there is an apparently massive amount of pixel 
data that are included in a single histopathological image. A small number of image 
patches can be selected to generate graph-based models. The global feature can 
be extracted by this method. However, the number of sampled patches limits the 
sampling area's coverage to the original image's informative regions, which causes 
a serious portion of fields with pathological features to be missing. The incident 
matrix, which represents the connectivity between vertices and hyperedges, is an 
essential component of the hypergraph neural network. The large scale vertices and 
hyperedges in the constructed hypergraph limit the application of HGNN [16]. 

Here, we introduce the Big-Hypergraph Factorization Neural Network (b- 
HGEN) [8], which uses factorization-based hypergraph reduction to address the 
above issue. It incorporates a factor embedding component that encodes the 
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relationships between hyperedges and vertices into two latent semantic spaces, as 
illustrated in Fig. 8.3. Due to the low dimension of the latent semantic space, b- 
HGEN can handle more vertices and hyperedges. With the hypergraph reduction, 
b-HGEN can provide more accurate feature representations of histopathological 
images from more densely sampled patches. Consequently, the first loss generated 
by the hypergraph dimensionality reduction can be written as Eq.(8.2). The 
hypergraph Laplacian matrix L is another crucial component of b-HGEN, and the 
low-dimensional hypergraph factorization Laplacian Lr is formulated as Eq. (8.3). 
A standard hypergraph neural network layer is represented as 


HGFConv(.) = D[o(@OxXd = Lr))|, (8.4) 


where o stands for the nonlinear activation function, and D represents the dropout 
layer. Convolution operations are embedded into the implicit latent semantic space 
by modifying the convolution network’s specifics, which are denoted as 


HGFConv(0) = D[o(00XOD;, "Hues, z)] 
HGFConv(1) = D[o (OVX x] 

m (8.5) 
HGFConv(L — 1) = D[o (GQ(--0XG--D y) 
HGFConv(L) = Dio (9X FHT , D, ^) | 


According to the HGFConv mentioned above, the hypergraph's high-dimensional 
connection relations can be embedded in the low-dimensional latent semantic 
spaces. To represent global features (i.e., X € R!*CL+1) at the histopathological 
image level, the output of the last layer of HGFConv (i.e., X^) is squeezed by a 
pooling layer after a complete b-HGEN. 

The patient survival duration prediction is calculated using a fully connected 
neural network after obtaining the histopathological image's feature representation. 
The hierarchical loss, which incorporates list-wise loss, pairwise loss, and point- 
wise loss, has been experimentally demonstrated to be more effective for b-HGEN 
than using the simply pairwise Bayesian Concordance Readjust (BCR) loss func- 
tion. The point-wise loss function applies negative Cox log partial likelihood loss 
as 


Ly= J ôi |—sit+log SY exp]. (8.6) 


JA pt; Sti} 


where s; and t; represent the predicted duration and the truth, while the pairwise loss 
and list-wise loss refer to NDCGLoss2 derived by LambdaLoss [17] and BCR loss 
[2]. Taken into consideration the loss function of hypergraph dimension reduction, 
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the combination of all loss functions can be expressed as 


Sy. = A. ü-2A).£g 
Lp = {NDCGLoss2(S, G), BCRLoss(S, G)} . (8.7) 
La = Ly pu 


The factorization-based hypergraph reduction incorporates a factor embedding 
component that encodes the relationships between hyperedges and vertices into two 
latent semantic spaces. Due to the low dimension of the latent semantic space, 
it can handle more vertices and hyperedges. The factorization-based hypergraph 
reduction can be used in HGNN [16] to solve the large scale problem. The method 
can effectively solve the hypergraph analysis problem with almost 10,000 vertices 
and hyperedges. 


8.3 Hierarchical Hypergraph Modeling 


The factorization-based hypergraph reduction can effectively analyze the hyper- 
graph with almost 10,000 vertices and hyperedges, while it stretches its limit when 
the size extends to hypergraph with millions of vertices or hyperedges. Figure 8.4 
shows a hierarchical hypergraph learning method for large scale hypergraphs with 
hierarchical labels. The hierarchical hypergraph can handle the hypergraph neural 
network with millions of data points. In the following, it is introduced in detail. 

For million-scale unstructured data, it is impractical to convert the whole dataset 
into a single large hypergraph to represent the correlations of samples or conduct 
the factorization-based reduction, which would require an unrealistically large 
incidence matrix or a significant cost of computing memory. If there are hierarchical 
labels in the dataset, hierarchical hypergraph learning can be adopted to solve the 
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Fig. 8.4 An illustration of the hierarchical hypergraph learning 
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problem. The original dataset X € R*4 can be randomly divided uniformly into 


several subsets with smaller and more affordable scales, with that N denotes the 
scale of dataset and d denotes the dimension of sample. Then, each sample in 
the dataset forms vertices and hyperedges. In each subset, we construct a sub- 
hypergraph using the K nearest neighbors algorithm (KNN), which is based on 
the Euclidean distance between the representations of each pair of vertices. The 
incidence matrix H; € R!”!*!¢! serves as the role of indicating the correlation 
among vertices and the hyperedges, of values consisting of 0 and 1. 

Given the initial feature matrix of vertices X as well as the corresponding 
incidence matrix H, we use 4; = (%, 6j), (i = 1, 2, 3, ... , m) to represent the i-th 
hypergraph that contains |7;| vertices and |& | hyperedges. In order to weaken the 
loss of feature over-smooth in the convolutional operations, the residual connection 
[4] can be adopted to generate the updated vertex representations for the next layer 
of convolution, formulated as follows: 

-1/2 


X; 2o(D; HiW; 27 Hj D; "Xj6; + Xi), (8.8) 
where D; € RI*IxI“l and 2; e RI&lxl&l are degree matrices of vertex and 
hyperedge. W; = diag(wj, w2,..., ws) and O; € R4*4 indicate the trainable 
weight parameters of the hyperedges and trainable weight matrix for feature 
transformation. 

Note that here we assume that each sample has two hierarchical labels, named 
secondary label and primary label, and in which secondary label is the fine-grained 
category of the primary label. One special component in this first step is the “vertex 
belonging matrix,” denoted as T; € R!”!*-"2, where .% is the number of secondary 
labels. The matrix T; is generated by the labels in the training set and serves as the 
input for the transductive learning method. 

The global labels shared by all the subsets are usually in the magnitude of 
hundreds, making it feasible to combine the independently learned label features of 
different groups. Obtaining the local latent high-order representations of subsets in 
the previous hypergraph learning step, two aggregating operations can be conducted 
here for primary and secondary labels classification, respectively. The aggregation 
of local secondary labels can be formulated as follows: 


S-r, (8.9) 


where X; denotes the aggregated local representation for secondary label, whose 
dimension is R^?*7, Each row of the matrix S; represents the latent feature for 
each specific category of secondary label in the i-th subset. " 

We then concatenate all of the local high-order vertices’ features X; to generate 
the global high-order vertices’ features X € RIVI*4 as follows: 


P pu i Pu uH 
K = [RUKH 1K] (8.10) 
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where -||- denotes the concatenating operation between two matrices. The local 
aggregated secondary features S; € IR^?*4 can be further fused to form the global 
secondary features S € R^?*7 by average pooling, formulated as follows: 


S=S;,@82.0---®S8n, (8.11) 


where Q denotes the average pooling operation, calculating the mean value of the 
corresponding latent features from the local secondary labels. 

The global high-order representation of primary labels (P € ^^i *4) is yielded 
from the global features of secondary labels, formulated as 


P = 9S, (8.12) 


where ® € Rí% denotes the owning relations between secondary and primary 
labels. 

Based on the results of the hypergraph convolution and global aggregation, the 
classifier consisting of the fully connected layers can be trained by concatenating 
the updated vertices’ high-order representations and the global classification. The 
augmented representations of vertices are shown below: 


y<l> _y. 1 pid j 
Xs = : | ur iE (8.13) 


D ox m > 
| zg 22j21 8j 


Then the aggregated features can be used for some tasks and trained with the 
hierarchical labels in training set. In the following, we introduce the hierarchical 
hypergraph learning in recommendation. 

Here, we introduce an application of hierarchical hypergraph learning for large 
scale user retrieval intention detection. Figure 8.5 shows the layout, which mainly 
consists of three steps: data division and local hypergraph modeling, latent high- 
order feature aggregation, and user intention prediction, respectively. 

First, we randomly divide the original dataset uniformly into several subsets. 
In our work, every query log and the relationships between numerous query logs 
form vertices and hyperedges. As shown in Fig. 8.5, the whole original dataset and 
the divided subsets are, respectively, denoted as V and 7;, where i € [1, 2, ..., m]. 
And in each subset, a sub-hypergraph can be constructed, which is introduced above. 
Note that the initial semantic embeddings of vertices (X; € R/ilxd ) are extracted 
by the well-known pre-trained models, such as BERT [18], where d denotes the 
dimension of embeddings. 

The hierarchical hypergraph learning can then be used to conduct the user 
intention prediction. In our research, the user intentions are categorized into two 
levels, i.e., the primary label and the secondary label, which is the fine-grained 
category of the primary label. After applying the hierarchical model, the features 
xe and X can be obtained. 
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In this application, the multi-classification can be converted into multiple 
binary classification problems to improve the effect of the model. We use @ = 
(5,5, ..., € y 3, N € (M, M) to denote the collection of the user intentions. 
Therefore, the original multiple labels are converted into two labels: 0 and 1. For 
instance, we traverse all the data with / intentions to label 1, and others to label 
0. Each classifier is trained using multi-layer perceptron (MLP) and the sigmoid 
activation function to implement label prediction based on the newly allocated 
binary label, formulated as follows: 

[z "m AE p (8.14) 
D, = o (X77 O p? + b2), 


where O ;, and O y? are the trainable transformation matrices. by and b» are the 
biases. o is the activation function. n and dp, denote the prediction of the primary 
and secondary user intentions, respectively. 

To supervise and optimize the trainable parameters, we apply the cross-entropy 
loss function in the training procedure: 


L = CEA, 24) + CEH, 25), (8.15) 


where & and & denote the ground truth of the primary and secondary user 
intentions, respectively. When all of the classifiers have been trained completely, 
each test sample can be predicated to obtain a list of scores for both primary and 
secondary user intentions. 

To summarize, the hierarchical hypergraph learning method can handle large 
scale hypergraphs with hierarchical labels, which divides a dataset into multiple sub- 
hypergraphs, and hierarchical aggregation is performed based on hierarchical labels. 
The hierarchical hypergraph can integrate with the hypergraph neural network to 
handle millions of data points. 


8.4 Summary 


This chapter describes two kinds of large scale hypergraph computation methods, 
i.e., factorization-based hypergraph reduction and hierarchical hypergraph learn- 
ing. The factorization-based hypergraph reduction is based on the strategy of 
factorization, which decomposes the large scale hypergraph into low-dimensional 
embeddings of vertices and hyperedges. It can support the processing of hyper- 
graphs with nearly 10,000 vertices or hyperedges. The hierarchical hypergraph 
learning is used to analyze hypergraphs with hierarchical labels, which divides a 
dataset into multiple sub-hypergraphs, and hierarchical aggregation is performed 
based on hierarchical labels. This method can support millions of data points. We 
also introduce two applications as examples, i.e., whole-slide image analysis and 
recommendation, to illustrate the usage of these two algorithms in practice. There 
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are some other large scale hypergraph application scenarios, such as community 
discovery [19], spectral clustering [20], etc. 
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Chapter 9 A) 
Hypergraph Computation for Social cree | 
Media Analysis 


Abstract Social media, such as Twitter and Weibo, have grown rapidly over the 
past decade. Large numbers of active social media users produce a voluminous 
amount of data each day, from which important insights can be drawn. Several 
applications, such as recommender system and sentiment analysis, have been 
developed to help study the users’ intension and portrait. One common challenge 
faced by these social media applications is how to leverage the complex and multi- 
modal data on social networks and model the higher-order associations hidden in 
the data. Hypergraph computation has the huge potential to be used in such analysis. 
In this chapter, we introduce three typical applications of hypergraph computation, 
i.e., recommender system, sentiment analysis, and emotion recognition, from which 
hypergraph computation has shown great value on social media analysis. 


9.1 Introduction 


With the fast development of information technologies, social media data have 
increased rapidly. Social media platforms provide new ways to produce and receive 
content, especially user-generated content. Users can shop, watch movies, and 
instantly participate in the propagation, interaction, and sharing of news events on 
the Internet. Rich behavioral data on social media platforms are generated by great 
numbers of users every day, which support different downstream applications and 
provide insights for better understanding of users’ intension and portrait. 

A typical social media analysis application is the so-called recommender sys- 
tem [1]. When listening to music, shopping, watching movies on the Internet, or 
looking for friends on social network services, users are likely to be drowned in an 
unprecedented amount of information. This is what we call “information overload.” 
To address this issue, recommender systems have been developed for decades. The 
main goal of recommender systems is to forecast how users would react to a product 
by better understanding their preferences based on the user’s historical interaction 
data, user profile, item attributes, context data, and other information. This could 
help predict whether the users like an item or not. For example, in the movie 
recommender system, the user profile may contain user ID, age, gender, income, 
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marital status, and more. The movie (item) attributes’ information may include 
movie ID, name, genre, director, released time, actors, and more. Interactive data 
contain the movies which users have seen and even provided comments. The goal 
of the movie recommender system is to integrate this information to recommend 
movies that users might like. 

Another popular social media analysis application is sentiment analysis [2]. The 
uses on social media platforms have generated a large amount of opinion data 
every moment in recent years, which helps to decode and mine users' attitudes 
on specific topics. Researchers have begun to look at sentiment analysis of users 
in social media contexts. In economics, stock price fluctuations can be forecast to 
some extent by analyzing the sentiment of social media users. In politics, social 
media posts can reflect public opinions. Users' sentiments may also affect their 
behaviors, for example, emotionally charged people are more likely to forward and 
repost tweets. Therefore, sentiment analysis plays an important role in social media 
analysis. However, sentiment analysis is challenging due to the multi-modality and 
complexity of social media data. For example, a tweet may include text, images, 
videos, and possibly more. Furthermore, there exist complex correlations among 
posts in various areas, such as the dimensions of time, location, and user preferences. 
The interaction among these users further increases the challenges in this task. 

In addition to the posts on social media, physiological signals can also be used 
to analyze the emotion of people [3]. Compared with text, facial expressions, and 
other data, physiological signals are not easy to disguise and can better reflect real 
emotions of people. Therefore, emotion recognition based on physiological signals 
plays an important role in many applications such as clinical diagnosis, which also 
has played a significant role on social media analysis when these data are available. 
Physiological signals of different modalities contain complementary information 
representations of human emotions. It is of great significance to discover and utilize 
the correlations among these representations to improve the accuracy of emotion 
recognition. 

From the above examples, it can be readily seen that one important issue of 
social media analysis is how to model complex correlations among data and to make 
use of the complementary information among multi-modal representations to better 
understand data. Hypergraphs have been widely used in social media computing in 
recent years because of their usefulness in complex data modeling. In the following, 
we discuss three applications of hypergraph computation in social media analysis: 
recommender system, sentiment analysis, and emotion recognition. In the recom- 
mender system, we discuss hypergraph-based collaborative filtering [4] and attribute 
inference. We then present sentiment prediction [5] and social event detection using 
hypergraph computation [6] for sentiment analysis. In the third section, we introduce 
two different hypergraph computation methods of emotion recognition using multi- 
modal physiological signals [7—9]. Part of the work introduced in this chapter has 
been published in [4—9]. 
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In recent years, the Internet has become an integral part of people's daily life— 
shopping, watching the news, listening to music, etc., on the Internet. However, with 
the explosion of information, people find it is increasingly difficult to sift through 
the massive volumes of data on the Internet to access the needed information. For 
example, when a user wants to watch a movie online and access the movie site, the 
user is likely to drown in thousands of movies and cannot find the one in mind. This 
is called the "information overload" issue. 

Recommender systems emerge under such circumstances. Recommender sys- 
tems are a powerful tool for reducing the problem of information overload since 
they may assist users to find useful information and assist service providers to boost 
profits. Recommender systems have been used in many online systems, from general 
platforms including e-commerce, social media, and content sharing to vertical 
services such as movie, news, and music websites. 

The core of the recommender system is to understand the users through their 
attribute information and historical interactions and then predict whether they would 
like one item. It is worth noticing that the user-side information, the item-side 
information, as well as the interaction data, play a vital role in this process. The 
user-side information, including gender, age, personality, etc., often reflects the 
users' preference. For example, male users may be more likely to read military 
and political news, while female users may prefer fashion and entertainment news. 
The item-side information, such as the category, text description, image, etc., 
can characterize the attribute of the item. Such attribute information may suggest 
potential consumer groups. For instance, health supplements may be bought more 
often by the elderly, while electronics are more likely to be purchased by younger 
people. Historical interactions also involve potential users’ preferences, which 
are suggested by the assumption that "behavioral similar users may have similar 
preferences on items." Figure 9.1 shows an example for recommender system based 
on similar patterns. 

We can find from these examples that what recommender systems actually do is 
to distinguish similar users from different perspectives based on complex, multi- 
modal given data. Therefore, one key problem is how to model and learn the 
complex relationship between users and items. Recently, hypergraph computation 
has attracted much attention and has been applied to recommender systems to 
help solve this problem. The hypergraph can naturally integrate the user-side 
information, item-side information, as well as the interaction data, thanks to flexible 
hyperedges and especially hyperedge groups. Therefore, similar users/items can be 
connected in different areas. In this section, we discuss two examples of applying 
hypergraph computation in recommender system, i.e., collaborative filtering and 
attribute inference. 
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Fig. 9.1 An example of recommender system based on similar patterns 


9.2.1 Collaborative Filtering 


In the past decades, collaborative filtering (CF), a crucial, popular recommendation 
technique, has been extensively used in various recommender systems. The funda- 
mental assumption of CF is that consumers who engage in similar behaviors, for 
example, reading the same kind of news frequently, are likely to have similar tastes 
for items, such as games, movies, and commodities. A common CF-based solution 
goes through the following two steps: first, it uses historical interactions to identify 
similar users and items; and second, it makes suggestions for users based on the 
information acquired in the last step. 

Since people and things have topological links that the network can describe, 
graph-based CF approaches have attracted a lot of interest in recent years. Although 
graph-based CF approaches have been explored for a long time and produced 
respectable performance, there are still certain restrictions. First, the high-order 
correlations in the user-item network are modeled and utilized insufficiently. 
For example, CF methods hope to find a group of behavior-similar users. Such 
associations between users are group-level (beyond pairwise) and cannot be well- 
captured by the graph structure since only pairwise correlations can be modeled in 
a graph. Second, when users and things are represented by a graph in graph-based 
approaches, there are no fundamental distinctions between them. When an item has 
many users connected to it, it is a popular item. In contrast, being connected to a 
variety of items does not necessarily mean that a user is well-liked. 

Under these circumstances, more adaptable and appropriate user and item 
modeling is required. Thanks to its adaptable hyperedges, the hypergraph structure, 
as opposed to the graph structure, offers a more natural approach for representing 
such high-order and intricate relationships. In this subsection, we present a dual 
channel hypergraph collaborative filtering (DHCF) framework [4] to solve the 
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Fig. 9.2 An example of hypergraph modeling for user-item network 


aforementioned problems. In the following, we introduce how to model the user- 
item interactions and learn the high-order connectivity with dual hypergraphs. 


(1) Hypergraph Modeling of High-Order Connectivity 

Given a user-item network, the high-order connectivity is captured by some self- 
defined association rules. Based on these rules, several hyperedge groups can be 
constructed, which can capture higher-order correlations rather than pairwise rela- 
tionships, e.g., by linking users who behave similarly but without direct connections. 
For example, we can connect the users who have purchased the same item with a 
hyperedge, as shown in Fig. 9.2. In addition to the interactions that are apparently 
visible in the observed data, these rules may also be thought of as a high-order 
perspective to describe the otherwise raw data. Here we introduce a way to capture 
the high-order connectivity with hypergraphs for users and items, separately. 


User Hypergraph Construction We first define the k-order neighbors for items. If 
there is a path between item, and itemp that consists of a series of adjacent vertices 
and has fewer users than k, then we can say item, (itemp) is itemp (itema)'s k- 
order reachable neighbor in the user-item network. 


We then define the k-order neighbor users for items. If there are direct paths 
between user, and item, and item, is itemb’s k-order neighbor, then usera is 
k-order neighbor for itemb. 

The BK (i) symbol represents the set of k-order BK (i) users for item i. A 
hypergraph can be defined mathematically as a set family where each set indicates a 
hyperedge. As a result, a hypergraph may be built using the k-order neighbor users 
set of an object. By using the above definitions, the corresponding hyperedge group 
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may be constructed as follows: 
Egk = (BEG) | i e I). (9.1) 


The k-order accessible matrix of items is denoted by AF € (0, 1)4*". which 
can be written as follows: 


AF = Min(1, power(H' - H, k)), (9.2) 


where the function pow(M, k) determines the k power of the matrix M in question. 
The incidence matrix of the user-item network is represented by H e (0, 1) * M, 
where N and M are the numbers of users and items, respectively. Then, the 
incidence matrix of the hyperedge group has the following form: 


Hg, = H. AF. (9.3) 


The hypergraph £, can capture the overall high-order correlations among users 
by fusing multiple hyperedge groups that are constructed via k-order reachable rule. 
Therefore, the H,, can be written as 


Hy = f (Eya Spins <-> Egia) = Hn Boll HB gis (9.4) 


p —— 
a 


where -||- is the concatenation operation, which is an example of hyperedge groups 
fusion function f(-). 


Item Hypergraph Construction Here the high-order connectivity for items is 
defined in a similar way. The k-order accessible matrix of user AF e {0, [yY*N 
is defined as 


AF = Min(1, power(H - H' , k)). (9.5) 


The incidence matrix H y. € (0, 11/4 XN can be written as 
t 


Hy = H! AF, (9.6) 


By assuming that we have b hyperedge groups, the item’s hypergraph incidence 
matrices H; are similarly formulated as follows: 


H; — (Egi egesta) = H ys UBL ell- TH gs : (9.7) 


~ 


b 


In this way, the high-order connectivity for both users and items is captured with 
a hypergraph. Figure 9.3 gives one example of the defined high-order connectivity 


9.2 Recommender System 165 


ith column 


user; 
1-order Reachable a user, 
Neighbors (Se) — -" 1  g-orderReachable iseis 
Pd ! Users (Set) , 
pum 7 4-------Jd.l2l2ll2235 PE ... | users 
a g4* ds Jj users 
/ $“order Reachable Hypereage 
: ^ Users (Set) a 
i T Users: H g2 
ith column 
item; 
user, 
user; 
user2 
users 
users 
Users: Hpi 
Fig. 9.3 The illustration of high-order connectivity for users 
for users [4]. Subsequently, two embedding look-up tables (E, = [e,,. .... euy] 
and E; = [e;,,..., ei, ]) are constructed to describe both users and items, which, 


together with the hypergraph structure, are prepared for later learning. 


(2) High-Order Information Passing 

When mixed high-order correlations have been obtained, the neighboring messages 
are aggregated using the high-order information passing technique, which can be 
expressed as 


| My, = HyConv(E,, Ay) (9.8) 
Mj = HyConv(E;, Hi) ' ` 
where HyConv(., -) can be any hypergraph convolution operation as that specified 
in HGNN (HGNNConv for short). Through information passing from high-order 
neighbors, the complex correlations between vertices have been encoded to the 
aggregated messages of users (M/,) and items (M/), respectively. It should be noted 
that the high-order neighbor mentioned here is not a fixed concept of the direct 
interactions in user-item network, but an abstract description that can link the 
similar users/items in latent behavior-attribute space. 

To provide an example of high-order information passing, we present the 
jump hypergraph convolution (JHyConv) in this part. Inspired by some pre- 
vious work [10], the JHyConv operator creates the learned representations by 
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concatenating a vertex's current representation with its aggregated neighborhood 
representation. The JHyConv is written as 


x") = o (D; ?Hp;'n"p; x09 +X), (9.9) 


where all symbols follow existing notations consistently. 

In contrast to conventional HGNNConv, the jump hypergraph convolution 
enables the model to take into account both its representation and aggregated high- 
order representations. The messages M,, and M; are then used to jointly update E, 
and Ej. 


(3) Joint Information Updating 
The goal of the joint information updating is to extract information that is discrimi- 
natory for users and items, which is formulated by 
/ — . 
By = JMU(M,, Mi) (9.10) 
E; = JMU(Mi, M) 


where any learnable feed-forward neural network may be used for JMU(., -). 
Updated embeddings for users and items are termed as E;, and E;, respectively. 
Here, a shared fully connected layer is applied. 


(4) Overall DHCF Layer 
The two stages of DHCF framework are illustrated in Figs. 9.4 and 9.5, respectively. 
The high-order information passing and joint information updating constitute an 
integrated DHCF layer, which, thanks to its powerful hypergraph structure, can 
directly model and encode the high-order connectivity. 

With the specified HyConv and JMU, a DHCF configuration can be formulated 
as follows: 


fC.) = Il 
HyConv(., -) = JHyConv(., -) , (9.11) 
JMU(,)-  MLPi() 


where MLP; (-) is a fully connected layer, © is trainable parameters, and -||- is the 
concatenation operation. 
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Fig. 9.4 The first stage of the DHCF framework 


(D: Matrix Transpose (Œ): Element-wise Add (À): Matrix Multiplication 


p) 


L 
Mxc M x ccc) 


Phase 2: Joint Information Updating 


Fig. 9.5 The second stage of the DHCF framework 
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The matrix form of the embedding propagation on hypergraph can be written as 
follows: 


H, = H|| (H(H' H)) 
H; = H'|| (H' (HH ))) 
M? = D," H, D; HI D, "Ej + EQ 


| hypergraph setup 


Phasel , (9.12) 
M? = p; ?npp;!H]p; "E +E” 
Ej =o (MP OV) 
Phase 2 


EVH” = o Me) 


where D,,,, Du, and D;,, Dj, are vertex degree and hyperedge degree matrices of 


user hypergraph H, and item hypergraph H;, respectively. Eo and EY are the 


BED EG) 


inputs for layer /, while and are the outputs for layer /. 


With the introduced framework, the collaborative signals in the user-item 
network are modeled and captured, thus achieving better representation. 


9.2.2 Attribute Inference 


A CF-based recommender system has the cold-start problem when there is a 
lack of historical behavior data of users, making it challenging to personalize 
recommendations to individual users. Making use of user and item attribute data is 
a potential answer to this issue. The attribute information of users usually includes 
gender, age, occupation, etc. The attribute information of an item can be the genre 
of a movie or music, or the classification of an item on an e-commerce website, 
etc. According to the principle of CF, similar users will choose similar items, 
and the attribute information can then be used to establish the similarity between 
users or items. The addition of attribute information can build up the association 
between users and items in the absence of user historical behaviors, which can well 
alleviate the cold-start problem. In other words, attribute information can assist in 
collaborative filtering. 

However, attribute information is often insufficient, as many people are reluctant 
to provide true personal information. Therefore, attribute inference becomes an 
important task. It is mutually reinforcing with the recommendation task, as high- 
quality attributes can help better with collaborative filtering, while more accurate 
user behavior can also help infer attributes of users and items. 

In this section, we discuss a framework of multi-task learning that combines the 
attribute inference task with the recommendation task. The framework first utilizes 
multi-channel hypergraph CF for representation learning, performs two downstream 
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Fig. 9.6 The pipeline of multi-channel hypergraph neural networks for recommendation and 
attribute inference 


Multi-Channel Hypergraph 
Learning 


tasks simultaneously, and lastly optimizes the model by downstream tasks. The 
pipeline of the framework is presented in Fig. 9.6. 


(1) Multi-Channel Hypergraph Collaborative Filtering 


Multi-Channel Hypergraph Construction In order to model the higher-order 
interactions and attributes between users and items, two hypergraphs, named 
Interaction Hypergraph and Attribute Hypergraph, are constructed and denoted as 7 
and A for simplicity. 

The structure of J is generated through the interaction between users and items. 
The implicit interaction matrix is represented as R € R"«*"v, where n, and 
ny denote user and item numbers, respectively. With the k-order reachable rule 
introduced in the previous subsection, we generate the hyperedges by connecting 
the user's and item's 1-order reachable users and items. The incidence matrix can 
be expressed as 


1 user; interacted with item; 
0 otherwise. 
1 item; interacted with user; 
0 otherwise. 


H; (i, j) -| 
(9.13) 
H}; (i, j) -| 


It is obvious that H¥ = R and H} = R'. 


The structure of A is generated through the attribute information of users and 
items. The user and item binary attribute matrices are denoted by X € R"«*"» and 
Y € R'»*"«, where n, and n, denote user and item attribute numbers, respectively. 
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Attributes represent hyperedges, and vertices with the same attributes are connected 
by hyperedges. The incidence matrix can be formulated as 


0 otherwise. 
1 item; has attribute ; 
0 otherwise. 


e ] user; has attribute ; 
AG J) = | j i 


(9.14) 
H; (i, j) = | 


Here we can have H} = X and H} = Y. 


Multi-Channel Hypergraph Learning When the hypergraph structure has been 
generated, the multi-channel hypergraph convolution is performed separately. It can 
be written as 


x**» — o (D. !?Hp;!H'b; 7x), (9.15) 


where X“) denotes the vertex embeddings after k-layer convolution, and it should 
be replaced by Ue and v? for user and item embeddings on channel c € (A, /} in 
our case. To bypass the over-smoothing problem, the results obtained from K -layer 
propagation are averaged as below: 


K 
1 
Ux “= rae Tuni KA X Vve: (9.16) 
1-0 


Moreover, to aggregate information from different channels, a channel attention 
mechanism is leveraged to generate the comprehensive user and item embeddings. 
It is defined as 


exp(al - WZ"U*) 
a, = fa (UX) = Pw EMITE, (9.17) 
»exp(a; - Wg” Us) 
[OWQUV. 
as, = fa VÐ = exp, C Wa Vo) (9.18) 


> .exp(al - Wg VS) 


where W, € mR4^*4 is the trainable parameter, and d denotes the embedding 
dimension. The comprehensive representations can be formulated as 


= Laie, v* = Lave (9.19) 


where c € (A,, Ij, Ay, Ip}. 
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The graph convolution is leveraged in order to further exploit the interaction data 
between users and items. It can be formulated as 


U* G0 -ın (0 R -1/2 UW 
(es =n n1 o JP veo Jo (20) 
J 
1 : 
v= pe. v= Fels (9.21) 
1=0 


where J is the number of graph convolution layers. 


(2) Recommendation Task and Attribute Inference Task 
Following up the representation learning through multi-channel hypergraph collab- 
orative filtering, the two downstream tasks can be performed simultaneously. 

First, based on the idea of matrix factorization, the user and item interaction can 
be predicted as 


R=OUV'. (9.22) 


Next, we consider the nature of the relationship between attributes and vertices, 
and a subtle method of attribute inference is discussed. Also inspired by matrix 
factorization, the attribute matrix can be regarded as the product of two low-rank 
matrices. It can be formulated as 


$ = ÔP', Ý = VQ, (9.23) 


where P € R"^*7 and Q € R"«*4 are the user and item attribute representations. 
The use of matrix factorization for attribute inference is very reasonable because 
attributes are influenced by the properties of vertices and the properties of attribute 
themselves; one cannot be presented without the other. In conclusion, the benefit 
of processing two distinct tasks concurrently with this method is that it permits 
information sharing while allowing a high degree of autonomy between the two 
training activities. 


(3) Joint Optimization 

A paired loss called Bayesian Personalized Ranking (BPR) promotes observable 
behavior predictions to outperform unobserved ones, and it is utilized to optimize 
the recommendation task. It can be written as 


Z= Yi logo hij — fu) + MIS. (9.24) 
je (i) KES (i) 


where ®, represents the model parameters and f;; = uj v j represents the 
probability that user; is interested in item;. The sigmoid function is denoted as o (-). 
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Next, the attribute inference task can be regarded as an attribute categories 
classification problem. The cross-entropy loss is then leveraged for optimizing the 
attribute inference task. It can be written as 


np 
ien -4 SS X xijlog (ij), 


i j=l 


Ng 
LP =-2 37 yijlog(is), 


i j=l 


L = rg, 


s 
| 


(9.25) 


where Xj,j = u; pj is the inference score of user; on user attribute;, and ĵ;, j = 
viq j is the inference score of item; on item attribute ;. 

Finally, the sum of the losses from the two tasks is the overall loss. It can be 
written as 


L=L,+y- Zi, (9.26) 


where y is the hyperparameter for balancing the two different losses. 

Although in this section we only discuss two instances, i.e., collaborative filtering 
and attribute inference, applications of hypergraph computation in recommender 
system do not end there. In collaborative filtering, only the historical interaction data 
are utilized, and the hypergraph is constructed upon the similarity of users/items in 
behavior space. In attribute inference, the attribute information of users and items 
is further utilized to solve the cold-start problem. In this case, the hypergraph is 
constructed based on both behaviors and attributes. In addition to the behavior and 
attribute data, the context data, such as the time, location, weather, etc., can also be 
integrated, and hypergraph can also be applied to model the complex correlations 
among these data. Also, the user—item network sometimes can be multiplex, that is, 
there may exist various kinds of interactions between users and items, e.g., a user 
may view, click, and purchase an item. How to adopt the hypergraph to model such 
multiplex connections also remains to be explored. 


9.3 Sentiment Analysis 


The emergence of Twitter and Sina Weibo has given social media users a place to 
share their thoughts and emotions about particular occurrences. At the same time, 
this information is rapidly and widely disseminated throughout social networks. 
Therefore, how to analyze the information in social media becomes an important 
issue. 

First, sentiment dimension, event monitoring, social network analysis, and 
business advice all have numerous potential applications for microblog sentiment 


9.3 Sentiment Analysis 173 


research. By analyzing the sentiment of massive data, we can get the emotional 
attitude of netizens toward relevant events. Second, real-time multimedia data may 
travel quickly and widely throughout the social network in terms of the temporal 
dimension, having a significant impact on society. Therefore, efficient real-time 
temporal detection can help government organizations with macroeconomic control 
and marketing management at huge corporations. 

There are multi-modal data among Twitter data, including text, images, emojis, 
videos, etc. The higher-order association between different modalities can be 
well modeled by hypergraphs to extract sentiment information. In the following 
subsections, we provide two examples to analyze the sentiment of microblog data 
in two dimensions using hypergraphs, respectively, [5, 6]. 


9.3.1 Sentiment Prediction 


Predicting multi-modal sentiment of tweets is not an easy task. Most sentiment 
analysis models focus on textual or visual channels only. However, in human emo- 
tional perception, different moods have their own characteristics so that sentiment 
analysis should be based on multiple perspectives. Even with multi-channel data, 
it is uncertain whether the emotions of different channels are related. Moreover, 
there are cases where some channels are missing. To address these problems, a 
two-layer multi-modal hypergraph learning framework [5] is introduced to create 
a multi-modal sentiment prediction. 

This framework's objective is to forecast the sentiment of provided multi-modal 
microblog data (e.g., a Weibo tweet) that include text, visuals, and emoticons. 
The bag-of-textual-words feature FP?" = (wf, ..., wh,} is extracted for textual 
modality. The visual modality feature provo = {w}, ..-, Wm, is extracted from 
the i-th image. Furthermore, an emoticon dictionary is defined for the emotical 
modality, which forms the bag-of-emoticon-words feature po = [wj,..., wp]. 
A corresponding sentiment score s;, sj, s¢ is assigned to wi, wj, we, respectively. 
Consequently, the tweet x; can be denoted as [proi poou jm Through 
investigating pepe poe and pruna simultaneously, the sentiment of x; can be 


predicted. 


(1) Multi-Modal Hypergraph Learning 

To create the incidence matrix of the hypergraph, the correlation between each tweet 
and the "centroid" tweets of various modalities is first computed. Each tweet is 
treated as a vertex and the hyperedges connecting its k nearest neighbors in each 
modality. It is important to note that each vertex can be thought of as a centroid. 
The incidence matrix can be defined as 


s(j, i) ifv; € ej 


9.27 
0 otherwise ( ) 


nes ep - | 
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wae n2 
where s(j, i) — exp(— SJ) is the correlation between v; and ej. dist(i, j) is 


the distance in Euclidean terms between v; and the centroid vertex of e;. d is the 
average pairwise distance for the corresponding modality, and the parameter o is 
empirically set to modify the normalization of the tweet relevance. Each hyperedge's 
weight starts out at 1. 

In multi-modal hypergraph learning (MHG) [5], guided inference is used to 
perform hypergraph learning. It calculates the relevance scores of tweets with 
varying attitudes by iteratively updating the relevance score vector f and the 
hyperedge weights W. It accomplishes the aforementioned objectives by optimizing 
the loss functions: 


Ne 

in(Q(D HAZ 2 

arg T (f) + emp (f) tu » Wi } (9.28) 
s.t. 271 wj — 1, 


where f is the learned relevance score, £2(f) is a regularizer built on the Hypergraph 


Ne 

Normalized Laplacian, Zemp(f) = || f — y||? denotes the empirical loss, and D w? 
i=l 

is the regularizer. 2 (f) can be formulated as 


eec 


4 w(e)h(u, e)h(v, e) f(u) f(v) 2 
dua D (Ja vas) 9? 


where d(v) = X` W(e)h(v, e) denotes vertex degree and ó(e) = > h(v,e) 

ecg ve*Y 
denotes hyperedge degree. Let © = D; "Hwb;H'Dp; ^ and A = I — © be 
the hypergraph Laplacian. The diagonal matrices of d(v) and ó(e) are represented 
as D, and D,, respectively. The normalized cost function can be expressed as 


Qf) =f! Af. (9.30) 


The two parameters W and f are optimized iteratively using the following two 
functions: 


arg min ® (f) = arg min{f" Af + ALF — yi3, (9.31) 
He 
arg min 6 (W) = arg min(f! Af + u Y; w?}, 
Ww wW i=l 


a (9.32) 
s.t. X wj = 1. 
i=l 
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As shown above, MHG simulates the sample-sample relation for the purpose of 
hypergraph construction. The properties of modalities and their relevance to one 
another, however, are not fully utilized. 


(2) Dual-Layer Multi-Modal Hypergraph Learning 

Dual-layer multi-modal hypergraph learning is composed of 2 hypergraph layers, 
Y = (94,61, W) for tweet-level hypergraph and 45 = (75, 6, M) for feature- 
level hypergraph, respectively. 

To allow multi-modal features to be adopted more explicitly and to directly 
construct multi-modal hypergraphs for modal correlation, each hypergraph layer 
of dual-layer multi-modal hypergraph learning uses relations between vertex and 
hyperedge to represent sample features or relations between features and samples, 
rather than relations between samples in MHG. 

The sentiment label vector of tweets and the sentiment label vector of multi- 
modal sentiment words are denoted, respectively, by y and t in distinct hypergraph 
layers. Therefore, in two hypergraph layers, f and g started out originally as 
vectors representing the relevance scores of tweets and multi-modal features/words, 
respectively. It is said that M can be regarded as the confidence ratings of the 
sentiment labels y, which correspond to f in the hypergraph of tweet level. Two 
hypergraph layers are connected, and the multi-modal relevance of features is 
transferred to the tweet-level hypergraph in order to help predict tweet sentiment. 

The probabilistic incidence matrix of a hypergraph is written as 


l if vj € ej 


H.(v;, ej) = ar 
due d 


(9.33) 


where x denotes either 1 or 2, and the same applies below. 


The following loss function can be optimized to represent the learning process: 


Nel Ne2 
arg, min (21 0+1 Hemp D + 1 LW; + 228) + à22emp2 (8) + u2 2, MT, 
e , l L 


zi 
st Diz Wi = 1 
a Ne , 
i1 Mi =1 


(9.34) 


where §2;(f) and £22(g) are regularizers based on the normalized Laplacian on 
hypergraph, Zemp (f) = |f — y o MI? and Zfmp2(g) = lig — t|| are the empirical 
losses, and 37*! W; and $777 M; are the L2 regularizers on the hyperedge 
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weights. In this scenario, empirical loss is represented as 455,5; (f) = |f — y o MI? 
and #empo(g) = lg — t||?, and me W; and pier Mj are the L» regularizers on 
the hyperedge weights. The normalized Laplacian on hypergraph regularizers £2 (f) 
and £25 (g) are further described as follows: 


21 = f'A- D; AWD, D; YE, (9.35) 
2o(g) -g'ü- Dj" H;MD;;H] D, ^g. 


The loss function then has the following form in terms of f, W, g, and M: 


Nel 
Lf, W, g, M) =21(£) + XiZonpi(f) + u1 X W? 


t 


ne? 
+ Q»(g) + A2Bemp2(g) + uo X M? (9.36) 


l 


nel Ne2 
+m (Èw x ) tn (Ew — i) 


i=l i=l 


To summarize, we introduced a two-layer multi-modal hypergraph learning 
framework that models correlations among visual, textual, and emoji modalities 
while allowing input from missing modalities to achieve document sentiment 
prediction for multi-modal tweets. 


9.3.2 Social Event Detection 


The expanding visual content of microblogs and the inter-connectedness of diverse 
data have received less attention from existing methods, while social event iden- 
tification as a crucial social media analysis problem has received much attention 
in recent years. Figure 9.7 presents an example of real-time social event. In social 
media platforms, event detection is a difficult issue due to the distinctiveness of 
social media data for the following reasons. First, it is required to explore a set 
of posts that are significantly related to one another and discuss a common issue 
because social media postings are noisy and do not include enough substantial 
material to provide full information. Second, social media posts can come in a 
variety of multimedia formats and include information such as images, timestamps, 
locations, user preferences, and social connections in addition to text. Finally, 
social posts are real time, and these large scale, real-time data make social events 
difficult to detect. Hypergraph, due to its natural structural advantages, can establish 
higher-order correlations between data of different posts, different modalities, and 
different times, thus enabling real-time event detection. In this subsection, we 


9.3 Sentiment Analysis 177 


IERETARN: 5 HARE SCC, IF 


DEBE (ZW: TMB 


—— Windows 8 Preview 
—— Dior Addict 
—— ShenGangMaco Auto Expo 


(b) (c) 


Fig. 9.7 An example of a real-time social event. (a) Conversational text. (b) Heterogeneous 
content. (c) Continuously growing real-time data. Parts of this figure are from [6] 
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Fig. 9.8 Overall framework of the real-time social event detection. This figure is from [6] 


introduce a hypergraph-based method for real-time social event detection. The 
overall framework is shown in Fig. 9.8. 


(1) Microblog Clique Generation 

Microblog clique (MC), which consists of a collection of closely connected tweets, 
is constructed as a basic unit rather than a single microblog in order to make up for 
the lack of information. These microblogs cover the same subject in short time. 

A hypergraph is used to describe the relationship between heterogeneous data 
of various tweets. A set of microblogs is denoted as M = (mi, m2, ..., my}. The 
constructed hypergraph fy = (Y^, £, W}, where a vertex v represents a microblog 
and a hyperedge e represents a subset of microblogs. The hyperedge weight is 
denoted as w(e), and its diagonal matrix is formed as W. The similarity between 
two microblogs m; and mj is first determined using the following heterogeneous 
features in order to generate hyperedges. 
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The cosine similarity function is used for computing textual and visual simi- 
larities. The Haversine formula is used for measuring the geographical similarity. 
The pairwise temporal similarity is calculated by sr;(m;, mj) = 1 — a The 
timestamps of m; and mj are ti; and tji, while t denotes a normalized constant. 
Measures of the pairwise social similarity are 


l, ifuj =u; 
ss(mj;,mj) = 1 0.5, if u; and u; are linked through the social platform , 
0, otherwise 
(9.37) 


where u; is the owner of m;i. 

Two hyperedges are created by connecting each microblog m; with its neighbors 
as per geographic distance and middle position of location and time information. For 
each microblog m;, the top N nearest microblogs in terms of textual information and 
visual content are chosen. Finally, all microblogs of the same user are connected to 
generate a hyperedge. The incidence matrix, vertex degree, and edge degree of the 
hypergraph are defined in the same way as above. 

Next, MC is generated by dividing microblogs into groups of the same topic 
through the hypergraph cut approach. Assume S and S are the results of y through 
the two-way partition, and the hypergraph cut can be described as 


Cuty (S, $) :— w (e) Shen 
An ER (9.38) 
aS := (e e Elen S # Ø, e N S xz. 


The definition of the two-way normalized partition is 


= = 1 1 


where the volume of S is denoted by vol(S) = *^ D(v). A real-valued optimization 


work can be relaxed from the normalized cut po By choosing the eigenvectors 
corresponding to the smallest non-zero eigenvalues of the hypergraph Laplacian, 
A-—I-D, i ^HWD;!H'Dp;" E and the solution can be found. The input 
tweets M are split into two groups, and then a bidirectional normalized partitioning 
is carried out recursively in each new set until the best partitioning outcome is 
attained. Based on the representation capacity of the various partitions as achieved 
by Bayesian Information Criteria (BIC), this best partitioning result is determined. 
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BIC is used to choose the optimal hypergraph partitioning results. For M = 
(mi, mo, ..., Mn}, with P = (P1, Po,..., Pm} as a set of partitions, the BIC score 
is determined by 


BIC = llh(M) — “2 logn, 
E RN 2 ni 
IIh(M) = X (5m — zp laters. cm) |? + 108%), (9.40) 


2! 1 
ĝ? = PERO ddim, 6535 
i 


where N, represents the parameter number and the microblog features’ dimension, 
n is the microblogs number, and n; is the count of corresponding partition of m;. 

Following the division of the provided microblogs into a group of MCs, the MCs 
offer more sensible information by examining a collection of strongly correlated 
microblogs rather than individual microblogs, which can express more meaningful 
and pertinent material in the succeeding event detection technique. 


(2) Detection of Social Events in Real Time 


Event Detection by Using MC For MC = (MC, ..., MC,J and corresponding 
microblogs M = {m;,..., Mn}, there are two observations as follows. First off, 
inside a single MC, and microblogs frequently refer to the same event (MC cues). 
Second, MCs with similar features tend to be associated with the same event 
(smoothness cues). 

If a microblog is integrated into an MC, it is connected to the MC to impose 
MC cues. In order to enforce smoothness cues, pairwise MCs that are close to one 
another in feature space are connected. Formally, a bipartite graph Gg = (X, Y, B} 
is used to express MC and M, and two vertex sets are expressed as X and Y, where 
X := MCOM,Y := MC, with |X| = |MC| + |M] and |Y| = |MC| vertices, 
respectively. The definition of the across-affinity matrix B between X and Y is as 
follows: 


n, ifx; € M, xi € yj, yj € MC 
Bij = į ei, if x; e MC, yj € MC ; (9.41) 
0, otherwise 


where dj; is the distance between two MCs, and 7 and y are the two parameters that 
balance the inner-MC correlation and the between-MC smoothness. 

The bipartite graph g and the necessary number of partitions K are used as the 
basis for the transfer cut method to partition MCs. First, assume gg = (Y, Wy} 
contains only vertices of the MC. Ly = Dy — Wy is the graph Laplacian of Gga, 
where Dy — diag(B ! 1), Wy = BD, B. Assume that (Aj, vi are the K smallest 
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eigenpairs of Gg. They can be calculated as 


0<& <1,62-&)=Ai, 
u; = Qv; fi = (ay, vj)". ic 
where Q — D,'B is the corresponding transition probability matrix from X to Y. 
Second, (fi, ..., fx} are K-spectra clustered and the best K is selected by BIC. 
Assume that Ko is the count of existing events. It is started at 0. Furthermore, 
suppose that the biggest number for incoming data is not larger than Ko + nnew/tm, 
where the threshold tm is used to decide the minimum microblog number. Therefore, 
the bipartite graph is segmented nnew/fm + 1 times, and the segmentation result 
is selected as the event detection result using BIC. Suppose (11,..., x} are the 
detected K events in the last process. The key MCs are found by MC selection for 
each T}, and the number of each MC is measured in terms of importance. Finally, 
the top ns wc MCs are selected to describe each T}. 


Detection of Incremental Social Events The real-time detection method is defined 
as follows. Assume that event detection is run at time fo, with generated MCs, i.e., 
MC = {MC;, ..., MCp}, detected events (11, ..., I5), and noisy data. New data 
arrive continuously from moment fo, and it can be processed a short time gap t. 
In other words, event detection can be run at every fo + x x t, where x equals to 
1, 2, .... In this instance, fo + A; is used as an example, and M; stands for newly 
arriving microblogs. The two steps that make up event detection are MC generation 
and event partition. 

To generate new MCs for previous time periods, MC* = (MCj, MC5,...., 
M C; were used as known samples. MC* and Myey are used to construct the 


incremental microblog hypergraph 4 B TON However, it is challenging because there 
is no clear distinction between a microblog collection and a microblog. No more 
than 3n, representative microblogs get to be chosen since only the three most 
representative tweets for each MC are chosen, depending on the amount of retweets 
and comments. To create the incremental microblog hypergraph qr. they are 
merged with Mnew. New MCs (MCy ew) are then created from these data using the 
hypergraph partition. Based on the representative microblogs, MCyewo and MC* 
are combined together. In this way, nwc,,,, new MCs (MC;eyo) are constructed and 
utilized for event detection. 

For detection in real time, the past events T = (I1,...,I'kJare used as 
known data in the time period. The corresponding representative MCs in J” and 
the generated incremental MCye, are used to jointly construct the next graph. The 
difference is that for the identified events, the distance between MCs is set to 0 as 
follows: 


new 


0, if x; € Tx and yj € IX 
dij = 1 min d(mxy, my;), otherwise , (9.43) 


mxy ex; 
my|eyj; 
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where k = 1, 2,..., K. Therefore, according to the BIC, the bipartite graph can be 
partitioned into existing events and new events. 


There are still several challenging problems in hypergraph computation for 
sentiment analysis tasks that can be continued for more research. First, for the 
sentiment recognition task, the case of conflicting multi-modal information can be 
considered. Second, further consideration can be given to the information that may 
be hidden in broken posts and users for the detection task on real-time social events. 
These tasks take into account the positive or negative associations among multiple 
entities, where the hypergraph is suitable for modeling such correlations. 


9.4 Emotion Recognition 


Emotion recognition has gained wide recognition in neuroscience and psychology 
research [11], and artificial intelligence offers more reliable and accurate com- 
putational models for the identification and study of emotions. It has also been 
extensively applied in real life [12], especially in human-computer interaction, 
motor vehicle driving assistance training, emotion classification in movies, and other 
pertinent similar areas [13]. 

Emotion recognition has three main goals [14]: first, to enable the understanding, 
inference, and recognition of human emotions by intelligent systems; second, to 
make it possible for systems to make human-like expressions of emotion in response 
to stimuli (e.g., conversational agents or robots); and third, to make it possible 
for intelligent systems to actually perceive emotions. Over the past three decades, 
researchers from several disciplines have pursued these three goals in different ways, 
with the method of recognizing emotions as the central issue of research. Although 
it has been studied for many years, progress is still being made. The reality is that 
there are various ways for people to convey their emotions, including language, 
gestures, facial expressions, and physiological signs [15]. Finding a suitable method 
to identify and analyze human emotions may be a long-term problem. Human 
volition determines the first three modalities, and there are substantial individual 
variances [16]. Because of these, approaches based on these three modalities have 
limitations in terms of accuracy and reliability. In contrast, physiological signals 
cannot be readily blocked or concealed and are simultaneously governed by the 
body’s neurological and hormonal systems. They are also often independent of 
human will. Therefore, physiological signals rather than visual or auditory cues may 
offer more accurate information about emotions [17]. A multitude of environmental 
and psychological elements, including interests and personality, can have an impact 
on human emotion, which is a highly subjective phenomenon. 

Nonetheless, because of the following factors, recognizing emotions through 
physiological signals is still a work in progress: 


* Existence of the emotional gap and ambiguity in the concept of emotions [18] 
* Potential associations between modality and subject [19] 
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* Specificity of the stimulus response (SR) and individual response (IR) [20] 
* Noise and incomplete data in the data [21] 
* Multifactorial influences [22] 


In this case, the hypergraph structure allows the establishment of complex 
correlations that can simultaneously take into account: (a) correlations between 
EEG, EOG, and EMG signals, which are signals from several modalities; (b) 
correlations between subjects; and (c) patterns of physiological signal changes in 
a single subject in response to various stimuli. Two methods are presented for 
emotion prediction using hypergraph computation, including multi-modal vertex- 
weighted hypergraph learning (MVHL) [7, 8] and multi-hypergraph neural networks 
(MHGNN) [9]. 


(1) Multi-Modal Vertex- Weighted Hypergraph Learning 

Hypergraphs have been used to depict the link between physiological data and 
personality [7]. In this way, MVHL introduces a multi-modal vertex-weighted 
hypergraph learning method for personalized emotion recognition (PER) that takes 
into account vertex weights, hyperedge weights, and modal weights. Each vertex 
in this method is a composite tuple (subject, stimulus). A hypergraph structure is 
used to develop personality correlations between various subjects and physiological 
correlations between the corresponding stimuli. Each vertex and hyperedge, as well 
as the weights of the various hypergraphs, are automatically learned. Hyperedge 
weights are used to create the optimal representation, while vertex weights are 
used to describe the impact of various samples and patterns in the learning process. 
The calculated factors—known as sentiment relevance—are employed for sentiment 
identification and are learned on a multi-modal vertex-weighted hypergraph. The 
fact that the vertices are composite with incorporated data from various subjects 
allows MVHL to identify numerous subjects’ emotions at once. 

The framework of this model is shown as follows. First, a composite tuple of 
vertices (subjects, stimuli) is formed using the subjects and the stimuli used to elicit 
the subjects’ emotions. Second, multi-modal hyperborders are constructed to form 
personality associations among different subjects and physiological associations 
among the corresponding stimuli. Finally, after joint learning of vertex-weighted 
multi-modal multi-task hypergraphs, PER results can be obtained. 


Hypergraph Construction This model constructs the hypergraph structure by 
pairwise similarity between different samples. The pairwise similarity of u; and 
u j's personalities is measured by the cosine function: 


< Pi, pj > 


(9.44) 
Ilp; ll - ipl 


SPER(Ui, Uj) = 


where u;’s personality vector is denoted by p;. The centroid is determined by 
selecting one vertex at a time, and a hyperedge is built to link the centroid to its 
K nearest neighbors in the existing representation space. It should be noted that 
personified hyperedges are built using both intra- and inter-subject viewpoints. A 


9.4 Emotion Recognition 183 


hyperedge links all the vertices from the same subject together. Additionally, based 
on personality similarities, the closest K subjects for each subject are chosen, and 
all of their vertices are connected by creating another hyperedge. 

Assume that the constructed hypergraphs are n = (Yn, 6m, Wm), where Ym 
and ém denote the vertex set and hyperedge set, respectively, and W, is the diagonal 
hyperedge weight matrix of the m-th hypergraph (m = 1, 2,..., M). The incidence 
matrix H, can be computed as 


l, ifv € 
H, (v, e) = F e p (9.45) 


The different weights of the vertices are learned to evaluate their value and 
contribution to the learning process. It is distinct from the classic hypergraph 
learning method, which simply views all the vertices equally. Assume Um is the 
diagonal matrix of vertex weight. The vertex degree and the hyperedge degree 
are defined as d, (v) = >> Wja(e)Ha (v, e) and (e) = »; Un(e)Hn(v, e). 


€€ n Ue 
Accordingly, the two diagonal matrices are defined as D}, (j,i) = d, (vi) and 


D; (i, i) = bm (ei). 


Multi-Modal Vertex-Weighted Hypergraph Learning The goal is to simul- 
taneously study the correlations among the included physiological signals and 
the personality relations across various subjects. The framework of the multi- 
modal vertex-weighted hypergraph learning is presented in Fig.9.9. Given N 


subjects uj,...,uw and the involved stimuli s;;j(j = 1,...,;) for uj, we 

assume that the c-th emotion category's compound vertices and associated labels 
n n 

are (Qu. 81] ia EE (Qv, Sup} ja and Yic = [yi QUE) isl" spy YNc = 

[Yiia es Vnd > where c = 1,..., Ne. 


The count of emotion categories is denoted as ne. The estimated values of 
all stimuli associated to the specified users of the c-th emotion category, also 


Subjects ana Sumali CSS os a Personalized Emotions 


v=(subject, stimuli) EEG 


òp, Bee) QV, Cem) GD 

= ROLLS T7 ml fe -— d, 
6,8 af) ae) X0 1 ^ "T 
EB ies =). 2. 


Fig. 9.9 Overall framework of the multi-modal vertex-weighted hypergraph learning. This figure 
is from [7] 
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known as emotion relevance, are given by ric = [rh re Fs swagINe = 
c c T 
[rus es TNny] . Yc, Fc are denoted by 
T T 1T T T 1T 
Ye = [Yi -- S Y ucl o Ye = [reso lel - (9.46) 


Let Y = [yi, .... ye. yn]; R = [ri rc, ... rs]; where the two trade-off 
parameters are A and 7. The hypergraph structure's regularizer is defined as follows: 


Ne M 
Y(R, W, U, a) = > rl 3 9g (Un — €5)re, (9.47) 
c=1 m=1 
where €, = (D?) UnAn Wn (DE) IH Un (D). Then, A = 


M 

X Æ&m(Um — Om) can be seen as the fused hypergraph Laplacian with vertex 
eiie, 

(2) Multi-Hypergraph Neural Networks 

Multi-hypergraph neural network (MHGNN) uses hypergraph to build complex 
correlations and identify emotions by physiological signals, which can take into 
account: (a) correlations between signals of various modalities, i.e., z EEG, EOG, 
and EMG; (b) relationships between subjects; and (c) patterns of physiological 
signal changes in a single person in response to various stimuli. This model groups 
each given subject and stimuli to a complex tuple, respectively. Assuming it is a 
vertex in the hypergraph, it would generate a hypergraph for each pattern with its 
corresponding physiological signal, making use of the term hyperedge to express 
the correlations among the physiological signals in response to various stimuli. The 
vertices are then categorized within the MHGNN framework in accordance with 
the intricate relationships in the data. As a result, the categorization of vertices 
in various hypergraphs can be equated to the recognition of emotions. Different 
hypergraph neural networks are combined using a fully connected network. The 
relative relevance of various multi-modal physiological signals is also taken into 
account of this network when classifying emotions. This framework’s primary 
benefit is its ability to combine multi-modal data and to represent three intricate 
relationships of the data. Figure 9.10 shows the pipeline of the MHGNN framework. 


Modeling of Multi-Hypergraph Subject correlation is formulated using a multi- 
hypergraph structure given a number of features from various physiological inputs. 
Each modality is represented by a separate hypergraph. The connections between 
the vertices of the hypergraph are constructed using hyperedges, and each vertex 
on the hypergraph represents a topic to be learned with a description of its 
corresponding stimuli. The k-NN method is used to generate hypergraphs, where 
k is a hyperparameter for assessing the connectivity. The hyperedges are created 
after all vertices have acted as the centroid. Each vertex gets chosen as a centroid 
once. We assume that S = S1, $2, ..., Sn is defined as a training set with modality 
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Fig. 9.10 The pipeline of multi-hypergraph neural networks 


i's features X? — xi, xs), i5 x , where vector E" is the feature of the 
j-th training sample from modality i and 5; denotes the j-th training sample. 
According to the KNN approach, the vertex vp shares the hyperedge with the k 
nearest vertices above and around it. Hyperedge ep is centered on the vertex vp. 
The Euclidean distance between the corresponding feature vectors represents the 
separation between two vertices. The correlation between vertex p and vertex q 
is represented by the matrix element A,,;. As an exponential representation of 
Euclidean distance, the correlation can be described as 


n = exp( g ^4€Up, (9.48) 


where d x, xË dy stands for the feature space Euclidean distance between samples 


p and q. The weight matrix W is set to be an identity matrix in our model because 
we lack prior knowledge regarding the significance of hyperedges. As a result, the 
incident matrix H” contains all the data for the hypergraph. 

An incidence matrix H(i) is generated for each modality. Finally, m incident 
matrices can be generated for m modalities. 


Multi-Hypergraph Convolutional Networks The creation of subject representa- 
tion and subsequent emotion classification are crucial steps in emotion recognition. 
Deep neural networks have made significant progress in the representation of data in 
the last few years. However, given the intricacy of data correlations, it is still work 
in progress. In order to represent data and recognize emotions, a multi-hypergraph 
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convolutional network framework that can simultaneously take into account several 
physiological inputs from different people is developed. 

In a hypergraph convolutional network, the spatial convolution is viewed from 
the perspective of graph spectral theory as a spectral matrix product, and the 
hypergraph Laplacian A is leveraged to convert it from the spatial domain to the 
spectral domain. A can be formulated as A = I — D, ii ^HWD;!H'D; U a where 
D, and D, are the matrices of hyperedge degree and vertex degree, respectively. 
In this case, it is possible to formulate a hypergraph convolutional layer for each 
modality as 


G) 
X41 


) =o (DO-'?HO WD?" HOTD?-'?X' 047) , (9.49) 
where et is the learnable parameter of the /-th layer in i-th hypergraph neural net- 
work (HGNN) and c is the activation function. When using hypergraph convolution, 
the parameters for @ are updated by backpropagating the feature X?. Hyper- 
graph structure-related parameters, such as pË- PHO WODËTIHOTDP-" e 
are pre-computed and are not trainable in this procedure. The symbol AQ is used 
to represent these parameters for simplification, and the hypergraph convolutional 
layer can be rewritten as 
(i) OKO gO 

x? =0 (i xen) (9.50) 
It is important to note that the formulation of graph convolution and hypergraph 
convolution is similar. The graph convolution is shown as follows: 


(i) 
X41 


LI (D°-'7aD©-!2x of?) (9.51) 

Hyperedges built from characteristics of several modalities are concatenated 
in traditional models of single hypergraph neural networks. However, because 
of their distinct sizes and dimensions, hyperedges have been known of being 
inconsistent. Additionally, there could be some variations in the perspectives from 
which various modalities approach the work. Some could be crucial, while others 
might not be just as important. In a single hypergraph model with identical weights, 
such discrepancies are not possible to see. However, simply concatenating distinct 
hyperedges makes it difficult to specifically weight them. A multi-hypergraph neural 
network structure is introduced to integrate multiple hypergraph structures in order 
to address the issue. 

To calculate intermediate representations for each modality, m hypergraph neural 
network models are built using m hypergraphs for m modalities. The K -layer i-th 
hypergraph neural network may be expressed as follows: 


HGNNOIO, x) = og (A Co APOP) OR). 9.52) 
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The final output is then generated using the m output of intermediate representa- 
tions by a fully connected layer. As a fusion layer, the layer dynamically combines 
the outcomes of hypergraph convolutions and weights them corresponding to 
their contributions. A softmax layer serves as the classifier. In layers of networks 
with diverse hypergraph structures, modality characteristics of various sizes and 
dimensions are learned. Finally, they are weighted automatically and merged into 
the fusion layer. 

W y and b; stand for the weights and bias of the fusion layer, respectively. The 
model can be expressed as follows: 


MHGNN(X(P, XO, ..., X?) =sof tmax (W/WnlHGNNG”, x), 
HGNN(HO,, XO», ..., (9.53) 


HGN N (H9, X()] + by) 


where the matrix of modality weights is denoted by Wm = Diag (wt, w», 
(m)) 
2 WI), 


The patterns were discovered to represent a pair of interconnected and mutually 
reinforcing interdisciplinary concerns by examining the data findings making use of 
the network structure of the hypergraph. Another intriguing occurrence in the exper- 
iments was the variations in each subject’s physiological characteristics. Therefore, 
what should be considered is to: (a) collect data according to the requirements of 
real application scenarios; (b) pay attention to individual differences; (c) analyze 
correlations between subjects of training and test samples; and (d) add more 
information such as action recognition information. Hypergraphs are considered as 
a good tool to discover biological patterns among them. 


9.5 Summary 


In this chapter, to illustrate the paradigm of using hypergraph computation in social 
media analysis, we overview three applications, i.e., recommender system, senti- 
ment analysis, and emotion recognition. In recommender system, we discuss two 
specific applications: collaborative filtering and attribute inference. Collaborative 
filtering only considers the raw user-item network, and hypergraph is used to model 
the inter- and intra-domain (user or item) correlations in behavior space. Attribute 
inference further takes the attribute information into consideration in addition to 
the historical interactions. Besides, context information such as time and location 
can also be integrated, which is left to explore. In sentiment analysis, sentiment 
prediction and social event detection are covered. The former task mainly concerns 
the sentiment conveyed by each multi-modal tweet, while the latter one focuses on 
exploring a group of postings that are closely connected and cover the same subjects. 
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Furthermore, recognizing the emotion of people through multi-modal physiological 
signals is also presented. There are still many social media analysis applications 
worth exploring with hypergraph computation. For example, heterogeneous corre- 
lations widely exist in the social media context. How to utilize the complementary 
information among these heterogeneous associations with hypergraph computation 
has become a key issue. Besides, social media data are always dynamic rather than 
static, and the newcoming data may have different distributions compared with the 
existing data. Under such circumstances, the static hypergraph computation method 
cannot be directly applied, and the dynamic hypergraph computation paradigm is 
deserved to be investigated to solve this complex issue. 
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Chapter 10 
Hypergraph Computation for Medical or | 
and Biological Applications 


Abstract Hypergraph computation, with its superior capability in complex data 
modeling, is a powerful tool for many medical and biological applications. In this 
chapter, we introduce four typical examples of the use of hypergraph computation 
in medical and biological applications, i.e., computer-aided diagnosis, survival 
prediction with histopathological images, drug discovery, and medical image seg- 
mentation. In each application, we present how to construct the hypergraph structure 
with different kinds of medical and biological data and different hypergraph 
computation strategies for these tasks respectively. We can notice that hypergraph 
computation has shown advantages in these applications. 


10.1 Introduction 


In the past few decades, massive biological and medical data were generated owing 
to the rapid development of big data techniques. These data can be used for tasks of 
disease gene analysis, disease risk assessment, targeted drug discovery, etc. The 
data further contribute to disease prevention and early diagnosis and treatment 
of diseases. The biological and medical data are complex, heterogeneous, and 
multi-modal, with widespread inter- and intra-data correlations. For example, in 
early disease diagnosis, patients with similar medical image appearance may also 
share similar disease conditions; different modalities of the medical image of the 
same patient, such as MRI and CT, may also exhibit disease characteristics from 
different perspectives; the patches within gigapixel histopathological images may 
have implicit collaborative associations that reveal patients’ potential health risks. 
Therefore, how to model such correlation behind these data is very important for 
medical and biological applications. 

Hypergraphs, which own the flexible hyperedges, provide a possible solution 
for modeling such complex correlations within medical and biological data. Given 
the observed data, the hypergraph structure can be generated using the previously 
mentioned methods and naturally incorporate multi-modal or heterogeneous data 
by concatenation of hyperedge groups and thus can discriminatively utilize the 
complementary information of these data. The applications of hypergraph compu- 
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tation in medical and biological tasks can be typically summarized as follows: (1) 
modeling the medical image, the patches, or the biological entities such as vertices, 
and connecting them with hyperedges following their feature similarity or high- 
order topological links; (2) exploring the high-order correlations between data using 
hypergraph label propagation or hypergraph neural networks so as to enhance the 
vertex representations; and (3) deploying these representations on the medical and 
biological tasks, such as medical image retrieval, disease identification, cancer tissue 
classification, survival prediction, and medical image segmentation. 

In this chapter, we discuss five typical applications of using hypergraph com- 
putation in medical and biological applications, i.e., computer-aided diagnosis, 
survival prediction with histopathological images, drug discovery, and medical 
image segmentation. In computer-aided diagnosis, three specific applications are 
included, i.e., the identification [1] and medical image retrieval of MCI [2], autism 
spectrum disorder identification using brain functional networks [3], as well as 
the identification of COVID-19 by CT imaging [4]. For survival prediction with 
histopathological images, two techniques targeting different cases are displayed, 
including ranking-based survival prediction [5] and multi-hypergraph modeling 
for survival prediction [6]. In drug discovery, a heterogeneous hypergraph-based 
drug-target interaction prediction technique [7] is presented. For medical image 
segmentation, we introduce the hierarchical hypergraph patch labeling method. Part 
of the work introduced in this chapter has been published in [1—8]. 


10.2 Computer-Aided Diagnosis 


Computer-aided diagnosis has made clinical diagnosis incredibly convenient with 
the advancement of artificial intelligence and owing to the widespread use of 
medical imaging data, including MRI, CT scan, histopathological images, and so 
on. Its main goal is to pursue a preliminary examination of patients for clinicians in 
order to increase diagnostic accuracy, avoid missed illnesses, and improve work 
efficiency. Many challenges still exist in the field of computer-aided diagnosis 
despite great machine learning and deep learning research advancements. It involves 
improper uses of information shared among patients and different forms of medical 
images, the continued existence of noisy data (such as variations in varied CT 
manufactures and patients’ movement during imaging), and the confusion of cases 
in the early stages of illness. 

In traditional approaches, the relationships among patients are frequently ignored 
in favor of merely taking into account one patient. The illness information of 
patients with similar medical images assists to raise the likelihood of computer- 
aided diagnosis since it makes sense that if the MRI or CT features of patients 
are related, then their disease conditions should also be similar. Therefore, since 
hyperedges in hypergraphs, unlike in graphs, can connect two or more vertices, 
this presents a potential solution for the first challenge by allowing hypergraphs to 
represent high-order illness connections among multiple individuals. 


10.2 Computer-Aided Diagnosis 193 


Computer-aided diagnosis with medical images frequently consists of three main 
steps in order to be more effective. Pre-processing the image is the first step, 
which mostly consists of enhancing visual information, filtering out the background, 
and separating the region of interest from the blank to lessen interference of 
irrelevant areas. The next stage is to extract the region of interest’s features. Imaging 
features including infection lesion count, mean lesion area, lesion density, and 
morphological aspects must be extracted from images since it is informative and 
contains task-independent information. The final step is to use machine learning, 
deep learning, or other statistical approaches to diagnose patients and then identify 
various types and lesion types with the features gathered in the previous steps. 

The use of hypergraph computing techniques in computer-aided diagnosis is 
introduced in the subsections that follow. Four specific applications are covered, 
namely MCI identification using MRI [1], medical image retrieval [2], COVID-19 
identification using CT imaging [4], and ASD identification using brain functional 
networks [3]. First, we present a strategy for creating a hypergraph for each MRI 
sequence and modeling the best correlation of patients by information shared by 
several MRI sequences. It then explains how to generate multi-graph combination 
weights to discover the association among query subjects and the existing subject 
classes. This enhances the precision of medical image retrieval. In the third 
part, the details of the uncertainty vertex-weighted hypergraph learning approach 
distinguishing COVID-19 from other types of pneumonia symptoms are described. 
Finally, we show the application of dynamic hypergraph learning methods to 
diagnose the autism of children using multi-modal functional connectivity. 


10.2.1 MCI Identification Using MRI 


Identifying the initial phase of Alzheimer’s disease (AD) [i.e., mild cognitive 
impairment (MCI)] to support the diagnosis is a proper but challenging task since 
AD is a relatively regular dementia in seniors. Taking into consideration that 
research has demonstrated that combining data from various data modalities can 
improve the accuracy of diagnosing AD/MCI, clinically routine scans are to be used 
in the upcoming hypergraph computing approaches to diagnosing AD to capture 
multiple MR sequences of various aspects of brain structures or functions and 
attempt to combine them optimally. 

The centralized hypergraph learning method (CHL) [1] integrates numerous 
imaging data in a semi-supervised manner to estimate correlations among various 
subjects to indicate the possibility that subjects belong to the same class. This 
improves the utilization of multi-modal data, of which the global illustration is 
shown in Fig. 10.1. In contrast to the usual graphs, hypergraphs propagate informa- 
tion by a group of hyperedges connecting two or more vertices concurrently. They 
can also capture higher-order relationships among various subjects by selecting 
the nearest neighbors in the feature space, i.e., whether a set of subjects in this 
task has common information, therefore allowing each subject to maximize the 
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Fig. 10.1 A pipeline to classify MCI or NC from multi-modal imaging data using centralized 
hypergraph learning. This figure is from [1] 


knowledge from MR sequences by optimizing concurrently the correlation and 
hyperedge weights among subjects. The entire process is sequentially presented in 
two stages, including the construction of a centralized hypergraph via processing 
data, and centralized hypergraph learning, to better introduce the details of using 
CHL in this chapter. 

Different types of imaging data from patients with MCI and normal control 
(NC) need to be pre-processed as features before such data are used to construct 
the hypergraphs. Thereafter, a hypergraph 4; = (%, 6}, Wi) is constructed for 
every sort of imaging data, where each subject is considered as a vertex, while 
the star expansion procedure is used to generate hyperedges. In particular, every 
vertex in each feature space is taken into account as the central vertex for generating 
a hyperedge, which consists of vertices located within distance gd of the center 
vertex, where y is a hyperparameter and d is the vertex's mean distance in 
feature space. The hypergraph incidence matrix H; produced by the star expansion 
procedure is formalized as 


exp ( — awaa) ifvee 


H; (v, e) = (10.1) 


9 
otherwise 


where d;(v, v.) represents the length from the vertex v to the correlating center 
vertex ve, and d; is the vertex's mean distance in feature space of the i-th type 
imaging data. It should be noted that the hyperedge weights W; start out with the 
same value, e.g., 1, when the hypergraph is generated. 
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For the MCI diagnostic work, which is regarded as a binary classification, various 
imaging data are employed to construct correlations among subjects using the 
centralized hypergraph learning method. Each step selects a hypergraph as the core 
hypergraph out of the four that were created from four types of data, with the others 
offering additional input for updating the hypergraphs. If hypergraph H; is the core, 
we obtain the j-th centralized hypergraph, and to understand the relationship of the 
vertices, the optimization formula can be written as 


arg min [950) + Romp) +u 9 9 Wi?) 
i i ecd (10.2) 


s.t. Hidiag(W;) = diag(D;), diag(Wi) > 0, 


where QUE j) is the regularizer to smooth out the correlations among vertices, 


Kemp represents the empirical loss,  '; ER & W; (e) represents an l2-norm reg- 
ularizer, and D? represents the degree matrix. By assigning different weights o, œ2 
to core hypergraph and others, respectively, the regularizer term can be formulated 
as 


Q5(Fj) 2o19;(Fj) + QF), (10.3) 
izj 
where Q;(F;) is equal to FTA — G;)F; with O; = D; "HWD;!H'D; ^. 
Consequently, regularizer is rewritten as: QE) = F; (ADF; with A5 =I- 
(010 + a2 2o Oi). 
The optimization of Eq.(10.2) consists of two steps. In the following, we 
optimize the relevance matrix F; with fixed W; as 


arg min{ 25 (F)) $ Map 7). (10.4) 
2 


which results in the closed-form answer for Fj; = mi — i519; + 
a2 254 y G;))-!Y. Following, we optimize the weight of hyperedges W; with 
fixed F; as 


arg min{ 250 +u 5 = Wi Od 
t i ec 


F (10.5) 
s.t. Hidiag(W;) = diag(D?), diag(W;) > 0. 
which can be optimized by quadratic programming. 


To best integrate data from various MRI, we generate the weights to every 
centralized hypergraph by minimizing the total hypergraph Laplacian, which is 
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expressed as 


arg min| X anf (Bi) + 371 
(10.6) 
S.t. Xai zu 


where pj represents the weight of the i-th centralized hypergraph, and 7 represents 
the trade-off parameter of the Laplacian and /2-norm regularizer. Determined by 
centralized hypergraph weights, the overall relevance matrix is F = ^ pjF;, of 
which the matching value can be used to categorize a subject. 

In this subsection, we have introduced a centralized hypergraph learning method 
to model patient relationships for MCI identification. For each type of data, hyper- 
graphs are constructed in the framework. In hypergraph learning, one hypergraph 
is chosen as the core hypergraph each time, and the remaining hypergraphs help 
the core hypergraph optimize the relevance matrix for prediction. The method not 
only takes into account the link among subjects, but it also makes use of a range of 
different types of data to increase the identification impact. 


10.2.2 Medical Image Retrieval 


Medical image retrieval is another crucial application of computer-aided diagnosis 
in Alzheimer's disease, along with the classification of patients with MCI or natural 
control introduced above. Its main goal is to offer clinicians with relevant MCI 
examples of visually comparable imaging data. Such data can also be provided to 
doctors in medical practice for instance thinking or scientific proof medicine. 

Two primary stages help compensate for the MCI diagnosis-aided medical 
image retrieval technique [2], i.e., query about the class prediction for choosing 
candidates and ranking. The first stage involves finding the database's most relevant 
subjects based on the query subject. Such knowledge is then used to predict, under 
supervision, the query subject's category, i.e., the MCI patients or NC in this case. 
The graphs based on the pairwise object distance from various data modalities are 
combined into a multi-graph to predict the category of the query, after that every 
subject falling under the same category as the query is regarded as a potential 
subject. Second, the query subject and all of the candidate subjects are represented 
together in a new multi-graph. The learning process on the multi-graph reveals how 
related each candidate is to the query subject, allowing for ranking depending on 
the quality of similarity. The details of the two stages are shown in Fig. 10.2 [2] and 
explained below. 

The query category is initially expected to use the subjects in the database given 
the query imaging data so that candidates can eventually be chosen based on the 
result. To analyze the similarity between the query subject and the training subjects 
chosen from the database, a graph 4; = (7;,6;, W;) with N + 1 vertices is 


10.2 Computer-Aided Diagnosis 197 


A Query 

Subject 
Query Category Learning-based Search Results 
Prediction Ranking e) Search Rosu 


Fig. 10.2 The pipeline for medical image retrieval method. This figure is from [2] 


generated for the imaging data of the i-th modality out of Nmoa modalities. The 
weight W; (vs, v;) of edge é; (v;, vt), which connects the s-th and t-th vertices of 
the graph 4, is given by 


d?(v,, v 
W; (vs, vj) = exp (5) 


(10.7) 


where d(v;, v;) represents the Euclidean distance between vertices vs and v; in 
the feature space. Similar to the processing of identifying MCI, the optimization 
equation for the multi-graph learning task for query category prediction can be 
written as 


Nod 


: p 2 
arg min| 3 o0) + nd + nleli . 
~ (10.8) 


Nod 


S.t. 5 Qj = l, 
i=l 


where and F represent the weighting parameters and the relevance matrix, respec- 
tively, u, 7 represent the trade-off hyperparameters, 2 represents the empirical loss, 
and §2; represents the regularizer term defined as 


F(v;, ) Ev, -) I 


2; = 2 Y Wit op - | (10.9 
(20252 OO ADs) SDC) | 


Us, Ut 
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To solve the aforementioned optimization equation, F and œ can be optimized 
alternatively. When o is fixed, the optimization equation for F is written as 


Nmod 
arg min x Qj £2; (F) + uA) >, (10.10) 


i=l 
which can be solved using the iterative process [9] formulated as 


Nmod 
1 u 
F(t 4-1) = —— a; OG; F(t) + —— Y, 10.11 
(t+ 1) uer a m ( ) 


where F(t) is the t-th step of the iteration started out with F(0) = Y. When F is 
fixed, the optimization equation for w can be formulated as 


Nod 
arg min p» oy Q;(F) + nlell5 t . 


= 
(10.12) 
Nmod 


s.t. >. Qj = l, 
i-l 


which can be worked on by applying the Lagrangian method. All database subjects 
belonging to the same category are employed as candidate retrieval results based on 
the learned category of query subject. 

Candidates are ranked for the retrieval of the most relevant subjects. Even though 
they are related to the same category of query subject, they may still differ from 
each other from the viewpoint of imaging appearance. Candidate subjects and query 
subjects construct graphs using each of N,,54 modalities, where the i-th graph can 
be referred to 4;, in a manner similar to the previous classification step. Since the 
graph's weight w has been learned, the optimization equation can be written as 


Nmod 


azmin] Y 0828) + ib. (10.13) 
f i=l 


where f and Ê represent the relevant vector and graph regularizer, respectively. d 
is the empirical loss. The optimization task, such as Eq. (10.10), is handled using an 
iterative procedure, represented by 


Nmod > 
^ UA À 
Îa +1) =- oO) + —— ]$. (10.14) 
Ati & Lat 
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The ranking of all candidates can be established by sorting based on the correlation 
given by Ê. 

This subsection introduces the process of retrieving data relevant to the query 
subject from medical imaging datasets to support the diagnosis of MCI. The first 
stage selects the candidate set from the database, and the second stage computes the 
correlation between the query subject and all of the subjects in the candidate set and 
then ranks the retrieval based on the correlation. Both stages employ multi-graphs 
to describe the relationship between subjects, so as to facilitate retrieval tasks. 


10.2.3 COVID-19 Identification Using CT Imaging 


The COVID-19 pandemic, which has become the most widespread public health 
crisis since late 2019, is brought on by an extremely infectious virus and can induce 
multiple organ failures and server respiratory distress. Therefore, it is crucial to 
correctly distinguish COVID-19 from other forms of pneumonia to help correctly 
design pneumonia treatment programs. Nevertheless, the task is complex, as there 
are two main difficulties, namely noisy data resulting from the highly varied data 
gathered during crises, and confusing cases resulting from the similarity between 
COVID-19 and other types of pneumonia cases of the initial phases of symptoms. 
Numerous investigations have demonstrated the usefulness of differentiating 
between COVID-19 and other types of pneumonia using CT, leading to the intro- 
duction of an uncertainty vertex-weighted hypergraph learning strategy to identify 
COVID-19 from other types of pneumonia using CT images [4]. It formulates data 
correlations among various instances to limit the interference by noisy data and 
confusing examples by employing an uncertainty rating quantification module and 
a vertex-weighted hypergraph structure. The framework introduction that follows 
is divided into three parts, namely pre-processing, measuring data uncertainty, and 
hypergraph construction and learning. Figure 10.3 depicts the overall illustration. 
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Fig. 10.3 An illustration of the uncertainty vertex-weighted hypergraph learning method for 
identifying COVID-19 among other types of pneumonia. This figure is from [4] 
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Regional features and radiomics features should be collected from the CT 
for every patient segregated using VB-Net [10] during the pre-processing stage. 
Regional features include the number of infected lesions and the surface area of 
the lesions, whereas textural features including the gray-level co-occurrence matrix 
are examples of radiomics features. The feature representation X of a patient’s CT 
image is constructed by combining the two categories of features with information 
on age and gender. 

Data uncertainty measurements are crucial in determining the dependability of 
various data throughout the learning process since noise can have an impact on data 
quality. The two types of uncertainty measurements are aleatoric and epistemic. 
The former one results from data abnormalities, noise, or other issues that lower 
the data quality, and the latter one is produced by the case’s features being at the 
decision boundary. The goal of parameter estimation under aleatoric uncertainty 
is to minimize the KL divergence for both the actual and forecasted distributions, 
which can be represented by 


A 


. 1 
O= arg miny Dxi(Pp(%i)|| Pe (Xi), (10.15) 
e 


where Pp(X;), Po (Xi) represent the real distribution and the predicted distribution, 
respectively. By way of optimization, the loss function is expressed as 


pe n 1 
L(O)= N » (5 exp(—a@ (X;))CE (yi, fo (Xi)) + zeo) ; (10.16) 


i 


where oo (X;) represents the log value of the estimated variance, and the aleatoric 
uncertainty defines as Ag (Xj) = exp(oo (X;)). Dropout can be used for inference 
to determine the epistemic uncertainty, which can be expressed as the model's 
inability to generate accurate predictions and is written as 


1 K 
&(fo(X) & x 2, fo (X0! fou (X) 7 EFS 04%) Eo ud), 
k=1 
(10.17) 


where o represents the set of random variables and k represents the k-th test with 
dropout. Here, the overall uncertainty is Yq (X;) = Ag(Xi) + &(fg(%i)). With 
normalization, the final uncertainty can be formulated as 


ya E 


Uj; — o( 
Se 


(10.18) 


where pe and Se represent the mean and the standard deviation of Y and o stands 
for the sigmoid function setting the output between 0 and 1. 
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Each instance is viewed as a vertex in the hypergraph that is constructed to mine 
high-order correlations among related patients for more precise prediction. Regional 
and radiomics features are used in the construction of hyperedges, respectively. In 
the regional features space, every vertex is regarded as a center vertex, and the 
nearest neighbor algorithm is used to link K nearest vertices to build a hyperedge. 
The similar method is applied to generate hyperedges using the radiomics feature. 
The uncertainty hypergraph, in contrast to the usual hypergraph, must take into 
account both the connection relationship and the vertex’s uncertainty score, leading 
to a more comprehensive explanation of the incident matrix in uncertainty vertex 
hypergraph = (V, £, W, U) as 


U; ifvje€ei 


H(vj,ej) = (10.19) 


0 otherwise ` 


The structure quantifies data uncertainty in comparison to conventional hypergraph 
learning strategies, and its optimization objective can be expressed as 


Qu(F) = arg ming{2(F) + AZemp(F)} 
Q(F, 7, U, 4, W) = tr(F' (U! — U' OgU)F) : (10.20) 
Romp (F, U) = YO EG A - YG, 0l 


where 92(-) and Zemp(-) represent the regular function and the empirical loss, 


respectively, and Oy is equal to D; / ?^HWD; H7 D; / * It is reasonable to rewrite 


the empirical loss as 
Remp(F, U) = tr(F'U'UF + Y'U'UY — 2F'U' UY). (10.21) 


The output matrix F € R’** (K representing the number of classes, i.e., K = 2 in 
this case) is thus represented as 


F —A(U! -U' 6yU+AU'U)'U UY. (10.22) 


New coming test cases can be classified as COVID-19 or other pneumonia types 
using the output label matrix established above. 


10.2.4 ASD Identification Using Brain Functional Networks 


Autism spectrum disorder (ASD) is a widespread developmental disorder that 
mostly affects children and has negative effects such as social communication 
impairments. Because of the rising cases, early identification and treatment of ASD 
are crucial in order to provide patients with new skills under clinical supervision. 
The diagnosis of ASD is mostly dependent on skilled specialists, and it is difficult 
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Fig. 10.4 A pipeline to classify ASD or healthy controls from brain functional networks data using 
dynamic hypergraph learning. This figure is from [3] 


to identify ASD quickly due to the shortage of experts. The correlation of various 
functional connectivity (FC) pattern features in ASD patients can be used for rapid 
diagnosis. 

The ASD identification method using brain functional networks [3] is divided 
into three stages, namely the selection of pre-processed features, hypergraph 
construction, and object identification using dynamic hypergraph learning. The 
overall process can be referred to Fig. 10.4. Static FC (sFC) and dynamic FC 
(dFC) are produced using a sliding window algorithm on the original functional 
magnetic resonance imaging time series in the first stage, and Lasso regression is 
then employed to accomplish the feature selection. The hypergraph construction 
stage creates a hypergraph based on the comparison of image features that represent 
data similarity in multiple modalities. Finally, ASD is identified using a multi- 
modal dynamic hypergraph learning technique that detects ASD and simultaneously 
improves the hypergraph structure. 

The feature selection stage aims to discover valuable features in dFC and sFC 
sequences. The i-th subject’s sFC sequence of t time points is first separated into 
n sub-sequences, with the j-th sub-sequence of {j, n + j, 2n + j,...} time points. 
Defining zi as the dynamic FC feature of the j-th sub-sequence in subject i, the 
Lasso regression model, as the selection operator, can be expressed as 


1 d -— 
VETE Le Bau 10.23 
arg min( zy 20 fo — p zi) + ulli) ( ) 
ic? j=1 
where t° = r/n is the length of the sub-sequences, y; represents the label 


of the subject, 6 is the regression coefficient, and u stands for the trade-off 
hyperparameter. Features with zero coefficients are discarded, and the remaining 
are indicated as z! . Defining X; as the static FC feature of the i-th subject, the Lasso 
regression model is expressed as 


T- \2 
3 (i-w-y x) * alli). (10.24) 
ie 


arg min( 
YO. 


1 
2| 
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where y; represents the label of the subject, y is the regression coefficient, and n 
stands for the trade-off hyperparameter. Features with non-zero coefficients in the 
sFC selection operator represented as x; are selected similarly to dFC. 

The dFC sub-hypergraph 4, = (7,6) and the sFC sub-hypergraph @ = 
(X, é), whose every vertex stands for a subject's sub-sequence, are combined 
to construct the hypergraph 4 = (VY, £), ie. & = 6| U é. Since sFC features 
are subject level, the features of sFC sub-sequences inherit the subjects' static 
modality, i.e., xj — xj. Each vertex in each sub-hypergraph is regarded as a central 
vertex, and the nearest neighbor algorithm is employed to connect k neighbors 
(k = 2n, 3n, ... , Kmaxn) to create kmax hyperedges. When the two sub-hypergraphs 
are generated, the hypergraph is formed at the same time, and its incident matrix is 
expressed as 


1 if 
Hoga SPEE (10.25) 
0 otherwise 


To enhance the structure of hypergraph and to help predict ASD, the potential 
equation of hyperedge can be defined as 


u H(u, e)H(v, e)g(u, v) 
f(e) = 2 E (10.26) 
where 
gu, v) = zl + a m 
n Eo l JWD VA 5 (10.27) 


dual j= 


ar aa la 


Here 5(e) represents the degree of hyperedge e, Şu, Yy stand for to-be-learned labels 
of u, v, respectively, and o, o» are the trade-off hyperparameters. It is noted that 
the potential function determines the data distribution on the hyperedge jointly 
from sFC, dFC, and label space. The dynamic hypergraph learning cost function 
is formulated as 


LG, H) = Jole) f(e) + Olly — $15 + AIH — Holl. (10.28) 
eec 


where œ (e) stands for the hyperedge's weight, Ho represents the initial hypergraph, 
and 0 and A are the trade-off hyperparameters, respectively. The objective function 
is shown to be divided into three terms: the first term is the loss function based on 
the hypergraph, and the following two terms are the empirical losses of y and H. 
The optimization of Eq. (10.28) consists of two stages. First, we optimize the to-be- 
learned labels y with the fixed H. The problem results in the closed-form solution 
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as follows: 


j= (1 + (10.29) 


1 -1 
0(1 +a + a) » 
where A — I-D; ^HWD;!H'D; e I, Dy, and D, represent the identity matrix, 
vertex degree diagonal matrix, and hyperedge degree diagonal matrix, respectively. 
In the following, we optimize H with the fixed y as 


Z(H) = «(a E D; "^ Hwb; 'HT D; ^K) + AJH — Holl, (10.30) 


where K = ($$! + a,)XX! + o2ZZ )/(1 +a; + o2), which is optimized using 
the projected gradient method. Optimization can be done by the iterative procedure, 
formulated as 


Ay41 = P[H; — h V.Z (Hj)] 


V.Z(H) = 2A(H — Ho) + JA @ H' D; "KD; !?H)WwD;? 
: (10.31) 
+ D; °” HAWD; 'H' D} KJW 


— 2p; "7 Kp; "HWD; ! 


where J = 11! , hg represents optimization step size of the k-th iteration, and P 
stands for the projection on the set {H|O < H < 1}. When the iterative process 
converges, the labels of its sub-sequences are aggregated, and the result of prediction 
is the category with the highest score after aggregation. 

In this section, we demonstrate the use of hypergraph-based approaches in 
four computer-aided diagnosis applications, namely MCI identification, medical 
image retrieval for MCI diagnostic assistance, COVID-19 identification, and ASD 
identification. Hypergraphs are employed in applications to represent high-order 
connections among subjects when mining complicated links among patients to 
gather knowledge than simply their images. In the future, it could be crucial to 
use hypergraphs to investigate few-shot learning approaches and transfer learning 
strategies in the domain of medical areas, such as MCI, COVID-19, and ASD. 


10.3 Survival Prediction with Histopathological Image 


Survival prediction is to model survival duration, which is the period that a patient 
is followed up on until a certain event, e.g., cancer recurrence or death. Survival 
prediction based on histopathological images is to predict the survival duration or 
survival risk to a satisfactory degree using only the patient’s images, to estimate the 
severity, or to classify high and low risks, which guides the pathologist to evaluate 
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the scenario. Since histopathological images typically include gigapixels, which are 
far more detailed than regular natural images, i.e., those in ImageNet [11] or MNIST 
[12], the main challenge of this work is how to reliably obtain the patient's feature 
representation for regression prediction analysis. Moreover, the relevant information 
for cells and tissues may not be readily extracted as it includes complex relationships 
and rich morphological structural content in histopathological images. 

To overcome the challenge of the large number of pixels, there exists a technique 
[13] that randomly chooses patches in histopathological images with a variety of 
cells and without blank. It extracts patch features using a pre-trained CNN network 
and calculates survival risk using Lasso-Cox [14] regression. To enhance the patient 
representation, low-level patch features produced by a pre-trained CNN-based 
feature extractor are optimized by a graph convolutional neural network to construct 
the intricate relationship between patches [15]. The power of random patch selection 
to cover the details of the initial histopathological image and the lack of mutual 
information between patches limit the representation learning capabilities of the 
non-graph-based method, whereas the method that uses graph modeling applies 
pairwise correlations modeling to make up for the loss of structural information 
among cells with similar roles. Nevertheless, reducing complex high-order connec- 
tions into pairwise relationships inevitably results in inaccurate modeling, losing 
data correlations among cells and tissues that are necessary to predict one's survival. 
Hence, the better solution is to model high-order data-associative representations 
employing hypergraph computational approaches to meet the challenges. 

The following subsection explains how to use hypergraph computing in survival 
prediction based on histopathological images with two parts, namely ranking-based 
survival prediction [5] and phenotypic and topological hypergraphs-based survival 
prediction [6]. In the first part, a nearest-neighbor-based hypergraph modeling 
methodology is introduced, and optimization is achieved using a ranking-based 
method. In the second part, the hypergraphs are created in the image space and 
merged for prediction. 


10.3.1  Ranking-Based Survival Prediction 


This part describes the three stages required for executing the ranking-based survival 
prediction task via hypergraph representation [5], namely pre-processing before 
generating hypergraph, learning hypergraph representation, and survival ranking 
prediction, as illustrated in Fig. 10.5. It is worth noting that these three components 
are related to the framework of the graph-based survival prediction task in general, 
not just the rank-based survival hypergraph framework. 

In the pre-processing stage, N patches are randomly chosen from each 
histopathological image, and each patch has the same size as a typical natural 
image (e.g., 224px x 224px). Directly choosing patches at random from the 
original image, however, likely picks up the noisy region as well (e.g., erosion and 
blank). Therefore, before randomization, the OTSU algorithm [16] is applied to 
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Fig. 10.5 A pipeline of ranking-based survival prediction utilizing hypergraph representation, 
including pre-processing, hazarding prediction via hypergraph representation, and ranking-based 
survival risk prediction. This figure is from [5] 


segregate cell tissue samples with rich information. Next, the foremost patch-level 
image optical structure features XÜ e RX” are extracted by a pre-trained deep 
neural network from ImageNet [11], where F represents the dimension of each 
patch feature. Image features, which are appropriate for the strata of complex tissue 
patterns, are included in the raw features that are retrieved from the pre-trained 
model and reflect the cells and tissues that are present in the patch. 

Following pre-processing to extract feature information at the patch level, the 
hypergraph computing approach is used to produce the features representing the 
histopathological image level for the subsequent prediction of the survival risk 
score. Hypergraphs are created using the distance-based hypergraph generation 
method since intuitive cells and tissues with similar morphologies have comparable 
functionalities. Each patch is regarded as a vertex, and each vertex is considered 
as the center vertex to generate a hyperedge. This results in a total of N nodes 
and N hyperedges in the hypergraph reflecting the structural information of 
the histopathological image. We build hyperedges using the k nearest neighbor 
approach, which connects k vertices with the closest Euclidean distance between 
raw features from its center vertex. Therefore, the hypergraph incident matrix H 
is obtained. Beyond pairwise graph structures, hierarchical grouping patterns can 
be discovered using a hyperedge structure that creates a channel for the transfer 
and integration of information from the k nearest morphological patches. The 
information fusion among patch vertex is then accomplished using hypergraph 
convolutional layers, as shown below: 


x) — o(D,'?HWD; HD, m 1x090), (10.32) 
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where X e R'*C is the /-th layer convolution input feature with N vertices 
and C; dimensions, X^ is the J-th layer convolution output feature, o stands 
for nonlinear activation function, and the /-th layer's learnable parameters are 
represented by ©”. The output X“+" of the last layer is used to forecast survival 
duration after L layers of convolution, where N hyperedges might reflect N patterns 
of causal variables. The predicted survival risk score is regressed using a fully 
connected neural network after X(^* is squeezed into X e R'*Cr« via the 
pooling layer representing patient's representation. The patient's actual survival 
time ¢ can be used to supervise the backpropagation process of the regression. 

Ranking information, which can be used to infer the conditions of nearby 
patients, is also significant in regression tasks in addition to the specific survival 
duration of every single patient. Moreover, the ranking data accurately portray 
patients' ranks for high and low risks. The prediction of survival ranking is intro- 
duced at the final, most significant, and enlightening stage. Pairs of histopathological 
images (i.e., pairs of patients) should be taken into consideration since models are 
trained on a single image currently, and the inability to distinguish the relative 
risks of two similar instances is the most frequent reason for inaccurate patient 
risk comparisons. To fine-tune the model parameters and enhance the accuracy 
of the model's forecast ranking, a Bayesian-based method known as Bayesian 
Concordance Readjust (BCR) is presented. The BCR loss function, which is 
employed in pairwise training of histopathological images, embodies the Bayesian 
Concordance Readjust and can be formulated as follows: 


L = - log (808 - (X; - X))). (10.33) 


where X; and X ; stand for the feature representation of patients i and j, respectively, 
and W represents the learnable parameters of regression. 

In this subsection, we provide a ranking-based survival prediction method for 
predicting a patient's survival hazard score from a single WSI image. The method 
first extracts informative patches from WSI images and then applies a hypergraph to 
describe the correlations among patches to create overall features of WSI. Finally, 
the method considers relative ranking information among various patients and 
achieves greater prediction results. 


10.3.2 Phenotypic and Topological Hypergraph Modeling 


The hypergraph for mining high-order correlations in the data is essential for 
accurately generating feature representation of histopathological images. We can 
notice that the previously presented ranking-based survival prediction method only 
employs the nearest neighbor generation method when constructing a hypergraph. 
This method only fine-tunes image features among patches with similar features and 
mines high-order relationships from one single perspective, which tends to leave 
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Fig. 10.6 Patch sampling and low-level feature extraction. This figure is from [6] 


other informative high-order relationships out. Therefore, here we describe a multi- 
hypergraph-based learning method for survival prediction [6], which efficiently 
achieves a high-order global representation of the histopathological image by using 
a variety of edges correlation modeling in several spaces and a basic hypergraph 
convolutional network. 

The goal of multi-hypergraph modeling is to uncover topological linkages among 
patches in image space and high-order connections among patches in latent feature 
space. The random sampling approach previously employed cannot be used since it 
is essential to analyze the topological connections of the image space; instead, the 
sampling is carried out according to the position of the patch in the original image. 
Therefore, the sampling process uses a boundary-to-center strategy (shown in 
Fig. 10.6) after the OSTU algorithm [16] filters noisy EE to produce informative 
regions of interest. In addition to selecting the border IB! and the bar C of regions 
of interest, pocap are chosen based on various distance radios of i ri 1 „and i 25: 1:8. 


Bi, B2, and B? in Fig. 10.6 from boundary to the center. ae a the same 
percentage of the distance from the border in the same region of interest and centers 
among regions can be taken up as correlating in the image space. 

A multi-hypergraph = (V, &) is constructed x. joining two sub-hypergraphs, 
namely a phenotypic sub-hypergraph phe = (Y^, Epne) created from the latent 
feature space and a topological sub-hypergraph fop = (VY, Sop) generated from 
image space, ie. © = phe U Gop, as shown in Fig. 10.7. Based on the 
Euclidean distances between extracted patch visual features, as explained in the 
previous method, the incident matrix of the phenotypic sub-hypergraph Hy; is built 
using the k nearest neighbor method. In the incident matrix of the topological sub- 
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Fig. 10.7 Construction of multi-hypergraph, which contains a phenotypic sub-hypergraph and a 
topological sub-hypergraph. This figure is from [6] 


hypergraph H;op, each vertex is linked to its neighbors in the topological space, i.e., 
1 1 3 
the centers of all regions of interest, B4, B2, B4, and the boundaries of each region 


of interest. 

The standard hypergraph neural network is modified to the hypergraph max- 
mask convolution with an increased number of hyperedges, which can address the 
overfitting issue brought up by a lack of training data. Each layer’s convolutional 
process consists of four steps, namely hyperedge feature gathering, max-mask 
operation, vertex feature aggregating, and vertex feature re-weighting. 

The features of each hyperedge Fare gathered during the first step from the 
vertices that are directly linked to it, which can be written as a product of H and 
X The hyperedge features F, LM of the convolutional layer are then produced 
by performing a max-mask operation on the features excluding A dominating 
hyperedges. In the final two steps, the output vertex features F, en are obtained 
by aggregating the hyperedge features by multiplying matrix H' and re-weighting 
them using a learnable parameter ©, respectively. Therefore, the whole steps of 
each layer of the hypergraph neural network in the framework are formulated as 


| XD — old - Lx? -HmB!g- Lxe»e | (10.34) 


git) = Had = L)x® ES xo 
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where X® stands for an offset matrix containing only the data from the dominant 
à hyperedges, and H7! (I — L)X™) ensures the computing gradients and adjusting 
vertex features have no impact on the top à hyperedges. 

With two learnable weight vectors, the vertex feature matrix X+D and the 
hyperedge matrix F, fero of the final layer are squeezed into feature vectors. The 
feature fusion module then merges the two vectors to establish a global feature 
representation that represents the entire hypergraph, i.e., the histopathological image 
for the regression task. 

In this subsection, we introduce a general framework and a ranking-based opti- 
mization method for the task of survival prediction using histopathological images. 
The survival prediction challenges are then addressed by replacing a single nearest 
neighbor modeling algorithm with the multiple hypergraphs modeling method. The 
transformer network is a commonly used model of long-term sequential data, while 
histopathological images also include a significant quantity of sequential topological 
histopathological information, making it conceivable to incorporate transformer 
to the survival prediction task. Therefore, in future works, we can attempt to 
include transformer into the framework's feature extraction or the construction of 
hypergraphs component. 


10.4 Drug Discovery 


Predicting drug-target interactions (DTIs) is a critical step in the process of discov- 
ering new drugs to treat diseases. Nevertheless, the commonly used biochemical 
experimental methods in wet laboratories are always costly and tedious. The 
development of drug discovery computational methods, of which machine learning 
based methods are one of the most promising, has been prompted by the growing 
need for low-cost, effective, and efficient DTI prediction methods. The core idea of 
these methods is that similar targets may be linked with similar drugs, and for the 
drug the assumption is symmetric. This assumption defacto implies the potential 
high-order associations between drugs and targets, especially when considering the 
complex heterogeneous biological networks that contain different biological entities 
such as proteins. 

In the DTI network, one single drug may interact with a group of targets, 
which can be generalized as a “one-to-many” pattern. When it comes to the 
aforementioned heterogeneous biological networks, the interactions between these 
biological entities become more complex, emerging as the *many-to-many" pattern. 
The hypergraph structure, which can naturally model high-order correlations owing 
to its flexible hyperedge, is suitable for modeling such a complex heterogeneous 
biological network. It can conveniently incorporate multiple complex interactions 
between different biological entities and further utilize the hypergraph computing 
technique to learn the correlations. 
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In this section, we present a heterogeneous hypergraph learning method for 
the DTI prediction (HHDTI) task [7]. The overall pipeline of the framework is 
illustrated in Fig. 10.8. It takes into consideration different types of interactions 
between biological entities (e.g., drug—target, drug—disease, and target—disease 
interactions) to facilitate DTI predictions. 


(1) Heterogeneous Hypergraph Modeling 
The overall procedure for modeling biological networks into a heterogeneous 
hypergraph is illustrated in Fig. 10.9. Given a heterogeneous biological network 
with different kinds of biological entities and interactions among these entities, 
the goal of hypergraph modeling is to characterize the heterogeneous biological 
network into a heterogeneous hypergraph 4 = (Y^, £). Here Y = {WUWUV...U 
Vo} indicates the vertex set, and & = {6,U&U...U4;} is the hyperedge set. o and 
r are the number of types for entities and interactions, respectively. Specifically, we 
have % = (v1, v2, ..., vM] with M, vertices and é = {e1, e2,..., ew,] with N, 
hyperedges. 

In the heterogeneous biological network discussed here, the set of entity types 
O contains drug, target, and disease. The set of interaction types R includes dr-ta, 
ta-dr, dr-di, and ta-di interactions.! Therefore, o is equal to 3 and r is equal to 4. 

Moreover, multiple sub-hypergraphs with one sub-hypergraph corresponding to 
one type of correlation on the basis of the overall heterogeneous hypergraph can be 
constructed. Therefore, four sub-hypergraphs are acquired in all, i.e., four incidence 
matrices, which are denoted as H e RV*N;, j € [l. r] and M is the number of two 
types of vertices corresponding to the correlation. Specifically, the four incidence 
matrices generated based on R are defined as (Hgr—ta, H;; .4,, Ha, 4i, Hia ai). 
Figure 10.10 shows an example of a drug hypergraph. 


(2) Drug and Target Embedding Learning 

The same framework is used to create the overall embeddings for both drugs 
and targets. We now briefly introduce how this framework learns drug and target 
embeddings. 

The overall embeddings are acquired by combining the main embeddings and 
the assisted embeddings. Particularly, the primarily vectorized representations for 
all drugs and targets are provided by the main embeddings, which are learned using 
direct DTIs. Contrarily, the assisted embeddings offer supplementary information 
discovered through disease-relevant data, such as dr—di and ta-di connections. 

We first take a drug as an example to demonstrate the learning framework. 
The drug's main embeddings 9^ are learned from Hg;—+q using an unsupervised 
Bayesian deep generative model, i.e., hypergraph variational auto-encoder, while the 
drug assisted embeddings are generated from H4;—qi by leveraging the hypergraph 
neural networks (HGNN) [17]. For the main embeddings learning, given the DTI 
sub-hypergraph structure Hj, .;4, the Bayesian deep generative model serves as a 


l dr, ta, di are abbreviations of drug, target, and disease, respectively. 
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Fig. 10.9 The overall procedure for modeling biological networks into a heterogeneous hyper- 
graph. This figure is from [7] 
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Fig. 10.10 An example of a drug hypergraph. Each vertex on the hypergraph represents a drug, 
and each hyperedge connects all the drugs that share the same target 


vertex encoder [18] to explore the potential associations between drugs linked with 
one target. This method conducts a nonlinear mapping to transform the hypergraph 
structure H,,. ,4 from the observed space into the shared space Ø’ dr—ta 88 


dra = f (aria War-ia + Bar-ia) , (10.35) 


where the activation function f (-) is nonlinear. 

The hyperbolic tangent tanh(x)(exp(x) — exp(—x)/exp(x) + exp(—x) is used 
here because of its analytic form and efficiency. Learnable weight and bias are 
represented by Wj, ;4 € RPin XPow and the bar-ra € R2™. Din and Dout are 
the corresponding dimensions of Hg;—;q and $5, a respectively. Following the 
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/ 


acquisition of the $^, , , 


and variance: 


two fully connected layers are used to estimate the mean 


Marta = f (o, NIS Su bos uu) (10.36) 
and 
Odr—ta = f [P Weta + DU , (10.37) 


where Wira Wo 1a € R Dou xD and b^. ra> borta € R? has been indicated 
before. The main embeddings 9* are then sampled by 


$* = Marta + Car—ta © €, (10.38) 


where © is the Hadamard product and & ~ N (0, I). 

In this way, the high-order structural correlations from the direct DTIs can be 
captured by the major embeddings. In addition to such straightforward interactions, 
other types of interactions can also contribute to DTI prediction, which has been 
validated by recent studies [19]. For instance, phenotypic side effects can be 
determined by how similar they are if these two drugs share a target [20, 21]. It has 
been verified in the literature that reported that targets can be used as a connection 
between drugs and illnesses [22]. Enlightened by these discoveries, auxiliary data 
are integrated into HHDTI, which can provide complementary information so as to 
improve prediction accuracy and treat extreme cases such as the cold-start problem 
(only a few DTIs can be fetched). 

Specifically, the dr—di and ta—di correlations are considered here in HHDTI, and 
the embeddings learned from the corresponding dr-di incidence matrices Hy;—q; are 
called drug assisted embeddings, which serve as the auxiliary representation for the 
drug's main embeddings. The drug assisted embeddings are learned by the HGNN 
model [17], with which the high-order correlations are encoded as 


Convh(H, X | W) = f (") ^n (n^) aT (p») |” xw) (10.39) 


where D" and D° are the degree matrices of vertex and hyperedge, respectively. 
The corresponding degree of vertex and hyperedge are (DY) kk = X HJ 
and (D^); ; = p» H', respectively. The matrix W is the learnable weight 
parameter, and (-)! is the transposition operator. Specifically, the convolutional 
layer used to learn the drug assisted embedding 7 can be formulated as 


o = Convh (Har—ai. 97D | wil =a) (10.40) 


where p pu. and WCD represent the (/ — 1)-th layer's input, output, and 
trainable weight matrix, respectively. Here, the identity matrix is set as the initial 
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value for X. That is, we have oF = X = I. To create the overall embeddings, an 


attention module is used to Sone the main embeddings and assisted embeddings 
into a single shared space. By determining the coefficients o», the bi-embedding 
attention fusion process is specifically employed to give various weights to the main 
embeddings and assisted embeddings: 


m = exp (f (o'Wi + v) ` P) = (10.41) 
2 jex,s xP (f ('W! + b) - Pi) 


where W! € RDxD' bi € RP. and P/ € RP'*! are trainable parameters. D and 
D' are the corresponding dimensions. The overall drug embeddings #5 can then be 
obtained by 


$5 = wk Ok + WPS. (10.42) 


The overall embeddings of targets ® " are generated similarly. The main differ- 
ence lies in that here the H;; 4, and H;, 4; are used as inputs. The target main 
embeddings pk are learned using the same vertex encoder as that of drugs. The 
HGNN model is also adopted to yield the target assisted embeddings 97 from the 
target-disease association hypergraph. Finally, the embedding attention fusion is 
run to achieve the overall target embeddings ps ‘ 


(3) Drug-Target Interactions Prediction 

The likelihood of the drug and the target embeddings is calculated to create the 
reconstruction space A, from which the DTI predictions are generated. That is, we 
have 


" 
A — Sigmoid (#3 (97) ) (10.43) 


where Sigmoid(-) is the sigmoid function. We then give the variational lower bound 
Z , which is optimized by 


vos [e(t 43.48)] ne (#818) 16) 
+KL la (o! | A) llp («:)]) (10.44) 


where KL[q(-)||p()] is the metric from distribution q(-) to p(-) in Kullback- 
Leibler divergence space. Varying b provides different acquired representations by 
changing the amount of learning pressure provided during training. Inspired by the 
variational auto-encoder, Gaussian priors p (95 l= lke (vf) - [I- (ei d 10; I) 
and ne = In p (vj = [Ij N G | 0, 1) can be taken into consideration. 
Here, E, [log p(- | -)] is the likelihood of reconstruction space A. 
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In this part, we introduce a general hypergraph-based framework for DTI 
predictions. It is noted that the introduced framework introduced here is neither 
restricted to these types of complex interactions nor the DTI prediction task here; 
other types of interactions that may contribute to the DTI prediction task or even 
other projects containing complex correlations are also thinkable. 

In real-world applications, the annotations for such biomedical data are compu- 
tationally expensive and time-consuming. Therefore, self-supervised learning has 
received a lot of attention recently since it can mine useful information from the 
data in an unsupervised way. Under such circumstances, it is of great significance 
to further devise the self-supervised hypergraph computation for DTI predictions. 


10.5 Medical Image Segmentation 


In the field of medical imaging, hypergraph-based image segmentation methods also 
play a crucial role, where there are limitations of traditional multi-atlas segmentation 
(MAS) methods in segmenting anatomical structures with poor image contrast. The 
hypergraph can be used. The hypergraph can model complex subject-within and 
subject-to-atlas image voxel relationships and propagate label on atlas image to 
target subject images. 

This method is named hierarchical hypergraph patch labeling (HHPL) [8], which 
characterizes higher-order associations between context features by constructing 
a hypergraph, and transforms hypergraph learning into a hierarchical model. At 
the same time, a dynamic label propagation strategy is used to augment reliably 
identified labels from subject images to help predict labels. 

As shown in Fig. 10.11, pairwise relations and complex higher-order associations 
in hyperedges are compared when using the MAS method, where p; is the subject 
image voxel, and R;(/) is defined as a 3-D cube of side length / centered on 
pi. Image patches are extracted using the target object image at voxel p; and 
the registration atlas image within the corresponding local neighborhood R,  ;(/). 
Hyperedges can be constructed similarly between the atlas image voxels and target 
subject image voxels with the high-level context features from the label probability 
map. 
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Fig. 10.11 Comparison of a simple pairwise relationship in the conventional MAS methods and 
the complex groupwise relationship in hyperedges (with much richer information). This figure is 
from [8] 


In particular, the subject vertices under the label and the related atlas vertices 
with known labels affect the labels on the target topic vertex. The label propagation 
process follows two principles: (1) if vertices are grouped in the same hyperedge, 
they have the same anatomical label. (2) The label difference between vertices 
with known labels before and after label propagation is to be as small as possible. 
Therefore, the objective function of hypergraph learning is defined as follows: 


arg min {lly — £3 + à - (t, H, W, De, Do]. (10.45) 


The first term is the control to minimize the difference between the initialization 
label vector y and the prediction vector f. The second term is the graph balance term 
defined as 


o (f, H, W, D,, D,) 


1 w(e)h(v,e)h(v',e) ( fœ) u f(r’) 2 (10.46) 
= 9 Dect bm ó(e) Jd(v) Jd(v') 
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We can determine the optimal f by differentiating the objective function with respect 
to f: 


f= A+A- 0) ly. (10.47) 


Having obtained the optimized f, it is easy to obtain the anatomical labels on the 
subject image from the symbolic calculation target of the correlation value 

foreground fi>0 ., 

, = 1523: Pl. 10.48 

| background f; < 0 ' |P] ( ) 


In other words, the segmentation can be repeatedly computed to improve the 
performance by: (1) hypergraph construction with high-level context features; (2) 
label propagation on hypergraph; and (3) the refinement of context features. The 
segmentation results can be found in Fig. 10.12. 


10.6 Summary 


In this chapter, we introduce three typical applications of hypergraph computa- 
tion in medical and biological tasks. In computer-aided diagnosis, three specific 
applications are covered, i.e., the identification and medical image retrieval of MCI 
and the identification of COVID-19 by CT imaging. These examples show how to 
adopt hypergraph computation for the tasks of classification and retrieval in medical 
and biological fields. For the survival prediction with histopathological images, the 
demonstrated hypergraph computation techniques can also be expanded to similar 
regression tasks. The introduced paradigm may also be applied to other cases with 
complicated connections. In summary, these examples demonstrate the high-order 
correlation between medical and biological data, which are modeled and learned by 
hypergraph computation. These indeed can contribute to the corresponding study. 
In addition to the aforementioned examples, there are many medical and biological 
applications that have the potential to be explored with hypergraph computation, 
such as medical image enhancement and multi-modal fusion. 
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Chapter 11 A 
Hypergraph Computation for Computer cree | 
Vision 


Abstract In this chapter, the applications of hypergraph computation in computer 
vision are introduced. Computer vision is one of the most widely used areas of 
hypergraph computation. The hypergraphs can be constructed by modeling the high- 
order relationship among inter- or intra-visual samples, and then computer vision 
tasks can be solved by hypergraph computation procedures. More specifically, four 
typical applications, including visual classification, 3D object retrieval, and tag- 
based social image retrieval, are provided, in which hypergraphs are used to model 
high-order relationship among samples and solve visual problems by hypergraph 
computation. For example, in social image retrieval, hypergraphs are used to model 
the high-order relationship among social images based on both visual and textual 
information, which is the high-order modeling of elements within samples. 


11.1 Introduction 


Hypergraphs have demonstrated excellent performance in modeling high-order 
relationship of data and have been applied in several fields. In computer vision, this 
property of hypergraphs is also promising for a wide range of works, and many 
researches focus on how to use hypergraph modeling to solve visual problems. 
On one hand, hypergraphs can be used to model high-order relationship of images 
within a class or different classes, and then to conduct the hypergraph-based label 
propagation procedures, which is useful for visual classification and retrieval. On 
the other hand, the relation can be modeled within the elements in a visual object to 
exploit the structural information. 

In this chapter, we discuss four typical applications of hypergraph computation 
in computer vision, i.e., visual classification [1-6], 3D object retrieval [2, 7—12], 
and tag-based social image retrieval [13-17]. In these applications, the vertices 
represent the visual objects, and a hypergraph is constructed to formulate the high- 
order correlations among all the samples by some metric. In this hypergraph, some 
vertices are labeled. The prediction of other vertices can be obtained by the label 
propagation procedure. Visual classification and retrieval problems can be solved 
by this method. The elements within one sample, such as pixels in an image, can 
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also be used to construct the hypergraphs. The properties with each element can be 
learned by conducting hypergraph computation, in which the semantic information 
can be learnt during this procedure. Part of the work introduced in this chapter has 
been published in [1, 2, 13]. 


11.2 Visual Classification 


Visual classification is the most widely used area of hypergraph in computer vision. 
Since visual data have a strong clustering characteristic, i.e., visual objects under 
one label show a clustered distribution in the feature space, this property is fully 
consistent with the hypothesis of hypergraph-based semi-supervised learning, and 
therefore, hypergraph-based semi-supervised learning is theoretically well-suited 
for image classification. A large number of researches have demonstrated its good 
performance [1, 2]. While there are many applications of hypergraph computation 
for image classification, they almost follow the same process. It starts out with 
hypergraph modeling of visual data. After extracting features by some feature 
extractors, the hypergraph is modeled based on the nearest neighbor relationship of 
visual features in the Euclidean space, and then label propagation on the hypergraph 
is adopted to achieve classification. We use the example of multi-view 3D object 
classification to introduce the process in detail. 

First, view-based 3D object classification needs to be introduced. Each 3D object 
can be represented by a set of views. Compared with the model representation 
method, the multi-view representation method is more flexible, with less com- 
putational overhead. It also has good representation capability. Classification of 
3D objects is illustrated in Fig. 11.1. After obtaining the multi-view 3D object 
data, the first step is to extract the features. There are many feature extraction 
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| guitar | 


Fig. 11.1 An illustration of the view-based 3D object classification framework. This figure is from 


[1] 
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methods for multi-view 3D objects, such as MVCNN [18], Zernike moments, 
etc. After obtaining the features of each group of views and each image in 
them, hyperedges can be constructed by k-NN with Euclidean distance as the 
metric. In fact, if several different features are used, multiple hypergraphs can be 
constructed, i.e., each hypergraph is constructed based on one feature. If m features 
are used, m hypergraphs can be generated, denoted by 4, = (74, 61, W1), $5 = 
(95, 65, W2), ..., Gm = (Yn, Em, Wm). After obtaining multiple hypergraphs, a 
weight cj, i = 1,...,m is assigned to each hypergraph 4, which constitutes a 
weight vector w. Up to this point, we obtain m hypergraphs with weights from the 
multi-view 3D dataset. 


Transductive Hypergraph Computation 

After getting multiple hypergraphs, we can get the label of each vertex by the 
formula of hypergraph-based semi-supervised learning. The pipeline is shown in 
Fig.11.2a. Note that since we are using multi-modal data, the contribution of 
different modalities to the classification may be different, such that we also have 
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Fig. 11.2 The general frameworks of transductive and inductive multi-hypergraph computation 
algorithms. (a) tMHL: transductive multi-hypergraph computation. (b) iMHL: inductive multi- 
hypergraph computation. This figure is from [1] 
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to take into account the influence of different modal weights when calculating the 
classification results and updating the weights during the computing process. The 
method of weight updating is described in the next section, and the focus here is to 
establish the idea of hypergraph processing of multi-modal features. 


Inductive Hypergraph Computation 
In real-world visual classification endeavors, transductive hypergraph computation 
can only be updated globally, and the high time complexity can hardly meet 
efficiency requirements of visual classification. To help solve this problem, inductive 
hypergraph computation is introduced, which can learn both projections of data to 
labels and weight vectors of multiple hypergraphs. It can also achieve real-time 
inference performance for newly added data, as shown in Fig. 11.2b. It is described 
in the following. 

In inductive hypergraph computation, a projection matrix M is learned, and the 
prediction for the unlabeled data is computed by M. 

The objective function for learning M is illustrated as 


arg min {2 (M) + A%emp (M) + nó (M)]. (11.1) 


Under the assumption that it is more likely that the vertices connected with one or 
more hyperedges have the same label, the hypergraph Laplacian regularizer 2 (M) 
is defined as follows, and it is in quadratic form of M: 


bx W (e) H (u, e) H (v, e) 
£2 (M) 7252.2. » 8(e) Ü 


k=1 eeg u,ve Y (11.2) 


=tr (MTXAx™M) 


| (X™Mu.,k) Kaway TE . 
where 9 = ( Jaa Jaa . It can be noted that (M) is in quadratic 


form of M. The empirical loss term Zem p (M) is defined as 
Bemp (M) = IIX' M — YI}. (11.3) 


$ (M) is an l2 ; norm regularizer. It is used to avoid overfitting for M. Meanwhile, 
it makes the rows in the matrix more sparse to be informative. It is defined as 


9 (M) = |IMII2.1. (11.4) 


The objective function of inductive hypergraph computation task can be written 
as 


arg min fir (M'xAx™M) 4+ AIIXTM — YIP + 1liMila.]. (11.5) 
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Note that the regularizer ®(M) is convex and non-smooth. Therefore, the 
objective function can be relaxed to the following: 


arg min fir (M™XAx™M) XIIX  M— YID utr (m™um)| (11.6) 


where U is a diagonal matrix, and its elements are defined as 


1 


2L, j=1,...44. (11.7) 
2|IM (i, :) [15 


ii 


To solve this optimization problem, U is set as an identity matrix first, and 
the iteratively reweighted least squares method is adopted. More specifically, each 
variable is updated alternately with the other fixed until convergence is achieved. 
First, U is fixed, and we derive objection with respect to M. The closed-form 
solution is 


_ T T =l 
M - A (XAX +AXX'+yU) XY. (11.8) 


Then M is fixed, while U is updated by Eq. (11.7). The procedure is repeated 
until both U and M converge. 
Given a testing sample x', the prediction of x' can be obtained by 


C(x!) = arg max x! M. (11.9) 


Hypergraph computation can achieve good results in visual classification prob- 
lems, where inductive hypergraph computation can achieve real-time online classi- 
fication while maintaining good classification performance. 


11.3 3D Object Retrieval 


3D object retrieval targets on finding similar 3D objects in the database, given 
a 3D query. Usually, each 3D object can be described by several different types 
of data, such as multiple views, point clouds, mesh, or voxel. The main task of 
3D object retrieval is to define an appropriate measure to calculate the similarity 
between each pair of 3D objects. Therefore, how to define such measures is the key 
for 3D object retrieval. Traditional methods mainly focus on either representation 
learning for each type of data or the distance metric for specific features. It is 
noted that the correlations among 3D objects are very complex, where the pair 
correlations and beyond-pair correlation both exist. To achieve better 3D object 
retrieval performance, it is important to take such high-order correlation among 3D 
objects into consideration. In this retrieval task, each vertex denotes a 3D object in 
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Fig. 11.3 An illustration of the hypergraph computation method for 3D object retrieval using 
multiple views. This figure is from [2] 


the database, and thus the number of vertices is equivalent to the number of objects 
in the database. 

Hypergraph can be used for such correlation modeling in 3D object retrieval. We 
introduce the hypergraph computation method [2] for 3D object retrieval here, and 
the framework is shown in Fig. 11.3. First a group of hypergraphs can be generated, 
and the learning process is conducted for similarity measurement. 

We take the multi-view representation as an example. All views of these 3D 
objects are first grouped into clusters. Objects with views in one cluster are then 
connected by hyperedges (note that a hyperedge can connect multiple vertices in a 
hypergraph). As a result, a hypergraph can be generated, in which vertices represent 
objects in a database. A hyperedge's weight is determined by the visual similarity 
between any two views in a cluster. Multiple hypergraphs can be generated by 
varying the number of clusters. These hypergraphs encode the relationships between 
objects at various granularities. When two 3D objects are connected by more and 
stronger hyperedges, they are with higher similarity. Then, these information can be 
used for 3D object retrieval. 

To generate a 3D object hypergraph, each object is as a vertex in the hypergraph 
G = (Y, E, W). The generated hypergraph has n vertices if there are n objects in a 
database. Each view for these 3D objects can be represented by pre-defined features, 
which can be different with respect to various of tasks. Given these features, the K- 
means clustering method can be used to group visual objects into clusters. Each 
object in a cluster has a corresponding hyperedge connecting them. There are two 
diagonal matrices D, and D, that represent the vertex and hyperedge degrees, 
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Fig. 11.4. An illustration of the hypergraph construction for 3D object hypergraph. (a) Views of 
different visual objects. (b) Hyperedges construction by view clusters. This figure is from [2] 


respectively, and an incidence matrix H is generated. The weight of a hyperedge 
e can be measured by 


o 


2 
we= Y] exp (-Se*), (11.10) 


Xa. Xpee 


where d(xq, xp) is the distance between x, and xp, which are two views in the 
same view cluster. d (x5, xp) can be calculated using the Euclidean distance. The 
parameter o is empirically set to the median distance between all pairs of these 
views. The hypergraph generation procedure is shown in Fig. 11.4. 

Let €, = (1, 61, Wi), A = (75,65, W2), ---, and &,, = (Pags Eng, Wn,) 
denote ng hypergraphs, and (Dy,, Dy,, ..., Dong), and {De , De,,..., Deng}, and 
(Hi, H5, ..., H, * be the vertex degree matrices, hyperedge degree matrices, and 
incidence matrices, respectively. The retrieval results are based on the fusion of these 
hypergraphs. The weight of the i-th hypergraph is denoted by a;, where bx 1% = 
1, anda; < 0. 

It is possible to consider retrieval as a one-class classification problem [19]. As a 
result, we formulate the transductive inference in terms of a regularization problem: 
arg mine {A Remp(f)} + 92 (f), and the regularizer term 2 (f) is defined by 


it w; (e)H; (u, e)Hj (v, e) f(u) f(v) s 
22 x. «(Gi Jas) ‘ee 


o; 
i=l eeó, uve 7, i(e) 


where vector f represents the relevance score to be learned. 

In this way, the similarity between each object and the query can be calculated 
based on the relevance score. It is noted that the feature used in this method can 
be selected based on the task itself, and multiple types of representations can also 
be used here. Given multiple features for the same data, or different features for 
multi-modal data, we can generate the hypergraph(s) using the method introduced 
in Chap. 4. 
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11.4 Tag-Based Social Image Retrieval 


User-generated tags are widely associated with the social images, which describe 
the content of the images. These tags are useful for the social image retrieval tasks 
benefited from the rich contents. Figure 11.5 shows some examples of social images 
associated with tags. 

The main challenge of applying such tags to social image retrieval is that too 
much noise makes it hard to mine the true relation among the tags and images, 
and the separation usage of the tags and images leads to a sub-optimal for image 
retrieval. In this section, we introduce a visual-textual joint relevance learning 
approach using hypergraph computation [13]. Figure 11.6 shows the illustration 
of the visual-textual joint relevance learning method on hypergraph for tag-based 
social image retrieval. In this method, the features for both the images and the tags 
are first extracted, and the hypergraph is constructed based on these features. Then, 
the hypergraph learning method is performed, and the learned semantic similarity 
can be used for tag-based social image retrieval. 

In this example, the bag-of-visual-words feature is selected for image represen- 
tation. For the i-th image, the visual content is represented by bag-of-visual-words 
que while for the corresponding tags, the bag-of-textual words representation 
i is employed. Then, the visual-content-based hyperedges and the tag-based 
hyperedges are constructed, respectively. The visual-content-based hyperedges 
connect the images that have the same visual word, and the tag-based hyperedges 
connect the images that have the same tag word. Figure 11.7 provides the examples 
of hyperedge generation process using textual information and visual information, 
respectively. Therefore, the overall hypergraph has ne = ne +n; hyperedges, where 
nc denotes the number of visual words, and n; denotes the number of tag words. 
After the construction of the hypergraph, the images sharing more visual words or 
tags are connected by more hyperedges, which can be used for further processing. 
Figure 11.8 further shows the connections between two social images, based on the 
textual and the visual information, respectively. 
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Fig. 11.5 Some social image examples with associated with tags. This figure is from [13] 
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Fig. 11.8 Anexample of connections between two images from textual and visual directions. This 
figure is from [13] 


Denoting f as the relevance score vector, y as the ground truth relevance, and w 
is the weight vector of hyperedges, the hypergraph computation can be formulated 
as 


Ne 
arg min & (f) = arg min f! Af+ [IE yl + o wO? ; 
fw f -— 
(11.12) 


Ne 
st. 3 wa) =1, 
i=l 


where A and u are the weighted parameters. The first term in Eq. (11.12) is the regu- 
larizer on the hypergraph structure, which is used to guarantee the smoothness over 
the hypergraph. The second term is the empirical loss between the relevance score 
vector and the ground truth. The last term represents the £2 norm of the hyperedge 
weights, which is used to learn better combination of different hyperedges. This 
optimization task can be easily solved using alternating optimization. First, w is 
fixed, and f is optimized by 


arg min  (f = arg min [f' Af + Alit - yif] (11.13) 
from which we can have 


f— — qq &9)-ly, (11.14) 
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Then, f is fixed, and w is optimized by 


Ne 
= a Jer 2 
arg min ® (f) = arg min I ai NO) | 
i= 


(11.15) 
Ne 
st.) wi) = 1,4 0. 
i=l 
The Lagrangian can be applied here, and we have 
~ A PIS? FAD Et 
wi) = F 3 (11.16) 


Ne 2nep 2u 


= 
where I” = D, °H and T; represents the i-th column of I’. 
The semantic relevance between an image x; and the query tag fg is estimated by 


1 
8(%i, tq) = ALIOD (11.17) 


E 


which denotes the average similarity between f, and all corresponding tags of x;, 
and Stag can be calculated as 


Stag (ti, t2) = e TP 6, (11.18) 


where F D represents the Flickr distance [20]. 

Given these similarities between each image and the query tag, we can have the 
retrieval results accordingly. We also note that the features used in this application 
can be changed with respect to the requirement of different tasks. 


11.5 Summary 


In this chapter, we have introduced the applications of hypergraph computation on 
computer vision, including visual classification, 3D object retrieval, and tag-based 
social image retrieval. For classification and retrieval tasks, hypergraphs can be 
used to model the high-order relationships among samples in the feature space and 
solve the problem by hypergraph-based label propagation methods. The success of 
hypergraphs for computer vision is due to the fact that the feature correlations of 
visual data are more complex that are hard to be explored by pairwise correlation 
methods. Hypergraph computation can be further used in other computer vision 
tasks, such as visual registration, visual segmentation, gaze estimation, etc. 


References 235 


References 


Jl. 


15. 


16. 


17. 


18. 


19. 


20. 


Z. Zhang, H. Lin, X. Zhao, R. Ji, Y. Gao, Inductive multi-hypergraph learning and its 
application on view-based 3D object classification. IEEE Trans. Image Process. 27(12), 5957— 
5968 (2018) 


. Y. Gao, M. Wang, D. Tao, R. Ji, Q. Dai, 3-D object retrieval and recognition with hypergraph 


analysis. IEEE Trans. Image Process. 21(9), 4290—4303 (2012) 


. J. Yu, D. Tao, M. Wang, Adaptive hypergraph learning and its application in image classifica- 


tion. IEEE Trans. Image Process. 21(7), 3262-3272 (2012) 


. D. Di, C. Zou, Y. Feng, H. Zhou, R. Ji, Q. Dai, Y. Gao, Generating hypergraph-based high-order 


representations of whole-slide histopathological images for survival prediction. IEEE Trans. 
Pattern Analy. Mach. Intell. 1—16 (2022). https://doi.org/10.1109/TPAMI.2022.3209652 


. D. Di, S. Li, J. Zhang, Y. Gao, Ranking-based survival prediction on histopathological whole- 


slide images, in Proceedings of the International Conference on Medical Image Computing 
and Computer-Assisted Intervention, (2020), pp. 428—438 


. D. Di, J. Zhang, F. Lei, Q. Tian, Y. Gao, Big-hypergraph factorization neural network for 


survival prediction from whole slide image. IEEE Trans. Image Process. 31, 1149-1160 (2022) 


. J. Bai, B. Gong, Y. Zhao, F. Lei, C. Yan, Y. Gao, Multi-scale representation learning on 


hypergraph for 3D shape retrieval and recognition. IEEE Trans. Image Process. 30, 5327-5338 
(2021) 


. G.Y. An, Y. Huo, S.E. Yoon, Hypergraph propagation and community selection for objects 


retrieval, in Proceedings of the Advances in Neural Information Processing Systems, (2021), 
pp. 3596—3608 


. D. Pedronette, L. Valem, J. Almeida, R. Torres, Multimedia retrieval through unsupervised 


hypergraph-based manifold ranking. IEEE Trans. Image Process. 28(12), 5824—5838 (2019) 


. L. Nong, J. Wang, J. Lin, H. Qiu, L. Zheng, W. Zhang, Hypergraph wavelet neural networks 


for 3D object classification. Neurocomputing. 463, 580—595 (2021) 


. S. Bai, X. Bai, Q. Tian, L.J. Latecki, Regularized diffusion process on bidirectional context for 


object retrieval. IEEE Trans. Pattern Analy. Mach. Intell. 41(5), 1213-1226 (2019) 


. F. Chen, B. Li, L. Li, 3D object retrieval with graph-based collaborative feature learning. J. 


Visual Commun. Image Represen. 28, 261—268 (2019) 


. Y. Gao, M. Wang, Z. Zha, J. Shen, X. Li, X. Wu, Visual-textual joint relevance learning for 


tag-based social image search. IEEE Trans. Image Process. 22(1), 363-376 (2013) 


. Y. Wang, L. Zhu, X. Qian, J. Han, Joint hypergraph learning for tag-based image retrieval. 


IEEE Trans. Image Process. 27(9), 4437-4451 (2018) 

L. Chen, Y. Gao, Y. Zhang, S, Wang, B. Zheng, Scalable hypergraph-based image retrieval 
and tagging system, in Proceedings of the 34th IEEE International Conference on Data 
Engineering (2018), pp. 257—268 

N. Bouhlel, G. Feki, C.B. Amar, Visual re-ranking via adaptive collaborative hypergraph 
learning for image retrieval, in Proceedings of the Advances in Information Retrieval - 42nd 
European Conference on IR Research (2020), pp. 511—526 

Y. Chu, C. Feng, C. Guo, Social-guided representation learning for images via deep heteroge- 
neous hypergraph embedding, in Proceedings of the 2018 IEEE International Conference on 
Multimedia and Expo (2018), pp. 1-6 

H. Su, S. Maji, E. Kalogerakis, E. Learned-Miller, Multi-view convolutional neural networks 
for 3d shape recognition, in Proceedings of the IEEE International Conference on Computer 
Vision (2015), pp. 945—953 

Y. Huang, Q. Liu, S. Zhang, D. Metaxas, Image retrieval via probabilistic hypergraph ranking, 
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2010), 
pp. 3376-3383 

L. Wu, X. Hua, N. Yu, W. Ma, S. Li, Flickr distance: a relationship measure for visual concepts, 
IEEE Transa. Pattern Analy. Mach. Intell. 34(5), 863-875 (2012) 


236 11 Hypergraph Computation for Computer Vision 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter's Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter's Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Chapter 12 A 
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Abstract This chapter introduces the DeepHypergraph library, which bridges the 
hypergraph theory and hypergraph applications. This library provides the generation 
of multiple low-order structures (such as graph and directed graph), high-order 
structures (such as hypergraph and directed hypergraph), datasets, operations, 
learning methods, visualizations, etc. We first introduce the design motivation and 
the overall architecture of the library. Then, we introduce the “correlation structure” 
and “function library” of the Deephypergraph library, respectively. 


12.1 Introduction 


We have designed DeepHypergraph (DHG),! a deep learning library built upon 
PyTorch? for hypergraph computation. It is a general framework that supports both 
low-order and high-order message passing such as from vertex to vertex, from 
vertex in one domain to vertex in another domain, from vertex to hyperedge, from 
hyperedge to vertex, and from vertex set to vertex set. It supports the generation 
of a wide variety of structures such as low-order structures (graph, directed graph, 
bipartite graph, etc.) and high-order structures (hypergraph, etc.). Various spectral- 
based operations (such as Laplacian-based smoothing) and spatial-based operations 
(such as message passing from domain to domain) are integrated inside different 
structures. It also provides multiple common metrics for performance evaluation 
on different tasks. A group of state-of-the-art models has also been implemented 
and can be easily used for research. We also provide several visualization tools for 
demonstration of both low-order structures and high-order structures. Besides, the 
dhg.experiments module (that implements Auto-ML upon Optuna?) can automati- 
cally tune the hyperparameters of the models in training and return the model with 


! deephypergraph.org. 
? http://pytorch.org/. 
? https://optuna.org/. 


€ The Author(s) 2023 237 
Q. Dai, Y. Gao, Hypergraph Computation, Artificial Intelligence: Foundations, 
Theory, and Algorithms, https://doi.org/10.1007/978-981-99-0185-2 12 


238 12 The DeepHypergraph Library 


the best performance. In this chapter, we first introduce the correlation structures in 
DHG and then introduce the function library in DHG. 


12.2 The Correlation Structures in DHG 


The core motivation of designing the DHG library is to attach the spectral-based 
and spatial-based operations to each specified structure. When a structure has been 
created, these related Laplacian matrices and message passing operations with 
different aggregation functions can be called and combined to manipulate different 
input features. Figure 12.1 illustrates the architecture of the “correlation structure” 
in DHG. Currently, the implemented correlation structures of DHG include graph, 
directed graph, bipartite graph, and hypergraph. For each correlation structure, 
DHG has developed the corresponding basic operations, such as construction and 
structure modification functions, related structure transformation functions, and 
learning functions. 

The most computation process on those correlation structures (graph, hyper- 
graph, etc.) can be divided into two categories: spectral-based convolution and 
spatial-based message passing. The spectral-based convolution methods, such as 
typical GCN [1] and HGNN [2], learn a Laplacian matrix for a given structure and 
perform vertex feature smoothing with the generated Laplacian matrix to embed 
low-order and high-order structures to vertex features. The spatial-based message 
passing methods, such as typical GraphSAGE [3], GAT [4], and HGNN+ [5], 


DHG Architecture 
Graph Directed Graph Bipartite Graph Hypergraph Directed Hypergraph 
Low-Order Structures High-Order Structures 


Correlation Structures 


Construction Graph Model 
t m 
Add Edges Hypergraph Convolution 
M ^ 
Remove Edges Bipartite graph Feature Smoothing Message Passing 
- H h Spectral-Based Spatial-Based 
Neighborhood Yperersp Operations Operations 
Basic Operations Structure Transformation Computation and Learning 


Fig. 12.1 The architecture of the “correlation structures” in DHG 
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perform vertex to vertex, vertex to hyperedge, hyperedge to vertex, and vertex set 
to vertex set message passing to embed the low-order and high-order structures 
to vertex features. The learned vertex features can also be pooled to generate the 
unified structure feature. Finally, the learned vertex features or structure features 
can be fed into many downstream tasks, such as classification, retrieval, regression, 
and link prediction, and applications including paper classification, movie recom- 
mender, drug exploitation, etc. 


12.3 The Function Library in DHG 


To facilitate the complex and repetition codes of learning on correlation structures, 
DHG further provides the function library. As shown in Fig. 12.2, the function 
library includes five parts: data module, metric module, visualization module, auto- 
ML module, and structure generators module. 

In the data module, DHG integrates more than 20 public graph/bipartite 
graph/hypergraph datasets and some commonly used pre-process function such 
as File Loader and Normalization. By default, DHG can automatically download 
the integrated datasets and check the integrity of the downloaded files. You can 
also manually construct your own dataset of DHG style with the existing Datapipe 
functions in DHG. 


DHG Function Library 


Hypergraph Min-max Scaler Batch-based Epoch-based 


Datasets Evaluators Evaluators 
Normalization Task Evaluators 
Bipartite Graph T 
Datasets " File Load 
MELEE Recall — NDCG F1 
Graph Datasets Downloader Accuracy Precision mAP 
Datasets Datapipes Basic Metrics 
Metric 
Model Builder High-order Structure Generators 
Structure Builder Low-order Structure Generators 
Auto-ML Structure Generators 


Fig. 12.2 The architecture of the “function library” in DHG 
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In the metric module, DHG has provided many widely used metrics such as 
Accuracy, Recall, and mAP for different tasks. Some encapsulation evaluators for 
different tasks such as classification, retrieval, and recommendation have also been 
implemented. Besides, DHG provides the structure and feature visualization func- 
tions, automatic hyperparameters search function, and random structure generation 
functions for different applications. 


12.4 Summary 


In this chapter, we introduce the DHG library for hypergraph computation. It 
simultaneously supports the generation and learning on low-order structures and 
high-order structures. Besides, many commonly used functions have also been 
integrated in the library. 
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13.4 Summary of This Book 


Hypergraph computation has attracted much attention and shown apparent advan- 
tages in many application fields, such as computer vision, social networks, and 
biomedicine. In this book, we systematically introduce the basic knowledge, 
algorithms, and applications of hypergraph computation in three parts and discuss 
some recent progress in this direction. 

In the first part, we mainly introduce the basic knowledge and main concepts 
of hypergraphs, including the definitions and symbols of common terms and 
the classification of hypergraphs. More importantly, we discuss the differences 
between hypergraphs and graphs from several aspects. Following, we introduce 
three hypergraph computation paradigms, namely, intra-hypergraph computation, 
inter-hypergraph computation, and hypergraph structure computation. In this part, 
we can have a general view of the different objectives in hypergraph computation. 

In the second part, we specifically introduce a series of algorithms from hyper- 
graph modeling to hypergraph neural networks. In hypergraph modeling sections, 
we show how to build a hypergraph structure from the collected data. As a typical 
and fundamental learning framework, label propagation on hypergraph describes 
how to derive the labels for unknown data from the labels for known data on the 
structure of a hypergraph. Other typical hypergraph computation tasks, including 
data clustering, cost-sensitive learning, and link prediction, are also introduced. 
Regarding the potential inaccurate hypergraph structure, we present the hypergraph 
structure evolution methods, which optimize the hypergraph structure on the 
basis of the initial structure. We further introduce the hypergraph neural network, 
which integrates the neural network framework into the hypergraph computation 
framework. The large scale hypergraph chapter discusses how to deal with large 
scale data for classification and clustering applications. 

In the third part, we introduce practical examples of hypergraph computation 
in social media analysis, medical and biological applications, and computer vision, 
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including specific tasks such as recommender system, sentiment analysis, computer- 
aided diagnosis, and image classification. In these examples, we show how to use 
hypergraph for high-order correlation modeling and select computation paradigms 
for different objectives. We further introduce the DeepHypergraph library for 
hypergraph computation. 


13.2 Future Work 


Although there have been many efforts to promote the development of hypergraph 
computation, there are still many open issues that need deep exploration, for 
instance, the mathematical foundations of hypergraph computation, the inter- 
pretability issues, and the temporal hypergraph modeling: 


1. At present, the theory of hypergraph modeling and optimization is still far 
from completeness. As a flexible modeling method for high-order complex 
correlations, the hypergraph's main components, i.e., the number and degree of 
hyperedges on hypergraph, are not fixed, and how to measure the complexity of 
hypergraph structure is a problem worth further exploring. Previous investigation 
has shown superior performance of hypergraph computation in various applica- 
tions, while the fundamental reason for this improvement and how much gain 
we can have from such high-order correlation modeling are still without a clear 
answer. In many tasks such as hypergraph matching, it is necessary to define 
the metrics in the hypergraph space. However, the problem is computationally 
expensive when the scale of hypergraph is very large. Therefore, efficient 
hypergraph matching and other algorithms are in immediate need. Existing 
hypergraph modeling methods still lack an evaluation of the quality of high-order 
correlation modeling and therefore lack credibility. It is needed to further explore 
the relationship between task complexity and structural complexity considering 
both the input data and the downstream tasks. It is expected that the hypergraphs 
can be generated according to the complexity of specific tasks and data, so as to 
achieve more reliable hypergraph modeling and optimization performance. 

2. Interpretability is also an important research area of neural network models, and 
its purpose is to complete the explanation of black-box models through tech- 
niques such as feature masking and visualization. Since the hypergraph structure 
provides additional topological information, it brings out new opportunities to 
interpretability of hypergraph neural networks. Although there has been some 
work on the explanation problems of deep graph models in recent years, it 
is still in the infant stage. Interpretability of deep hypergraph model could be 
a potential road toward better deep neural network interpretability. There are 
two feasible paths to explanation techniques for hypergraph neural networks: 
instance-level explanation methods and model-level explanation methods. For 
example, it is possible to use different mask generation algorithms to obtain 
masks corresponding to vertices, edges, or the incidence matrix and then apply 
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the masks as disturbances to cover the original structure information to study 
the effects of different disturbances to the original structure. In addition, for 
hypergraphs in biochemistry, neurobiology, ecology, and engineering, of which 
the structure is highly correlated with their functions, how to combine domain 
knowledge to improve model interpretability is also an important issue. Finally, 
for text or image data, humans can easily understand the semantic information. 
However, it is difficult to intuitively understand the information of hypergraph 
structure. How to visualize the high-order complex correlations for intuitive 
understanding remains a challenge. 

. The combination of the temporal sequences and the hypergraph neural networks 
is also worth exploring. Recent research mainly focuses on static data and 
static hypergraph, where the data and the structure are kept fixed. However, in 
real-world applications, data may vary over time, which is called the temporal 
sequence, as well as the topology among the data. Therefore, the temporal 
information should be considered, and the temporal hypergraph neural networks 
aim to combine the temporal and spatial information. According to the variation 
of the data and the structure, there exist two main scenarios: 


* Time sequence data with static structure. This is a common scenario in the 
field of traffic forecasting, action recognition, and anomaly detection. 

* Time sequence data with evolving structure. This scenario mostly appears in 
the field of stock prediction and video relation detection. 


Under the above application circumstances, temporal hypergraph modeling is 
worth study. There are multiple challenges for the tasks mentioned above. For the 
sensor data, different types of the data are raised by different types of the sensors, 
while the typical hypergraph neural networks treat the data of the vertices equally. 
Both temporal and spatial high-order relationships vary over time, which makes 
the message passing procedure complex. New vertices/hyperedges emerge, and 
old vertices/hyperedges dissolve during the variation of the structure, which 
makes it complex to continuously model the varying correlation and aggregate 
the messages. The vertices/hyperedges may even be completely different at 
different time steps, which makes the representation-based method questionable. 
In order to model the temporal information, the vertex representations should 
be dynamic, and therefore, the representation should be learned on a functional 
space, rather than on the common vector space. The temporal information 
from both the vertex representation and the structure topology defies extraction. 
Considering these challenges, the temporal hypergraph still has a long way to go 
and needs further exploration. 


Besides the above research directions, there are also several other interesting top- 


ics, such as big hypergraph model, hypergraph database, and distributed hypergraph, 
which have not been introduced in detail in this book and deserved further study. 
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